Fabian Heinemann

Industry

Semantic data integration in the health care industry

A common problem in large and complex organizations is the distribution of information over various systems. This hampers overview creation of business relevant data. Typically, this issue is further complicated by the problem, that often non-standardized strings are used to describe concepts (i.e. external cooperation partners or diseases are expressed in various ways). In the context of clinical cooperation’s, we developed an application to solve these issues. First, we integrated data from several relational databases with different table schemas. To become independent of the source system, we converted the data to simple RDF triples containing the information from the different relational databases. Subsequently, this intermediate RDF was processed in Unified Views, a custom ETL (Extract Transform Load) tool, specifically build for RDF - ETL tasks. Within Unified Views, spelling variants were normalized and further information such as institutional or disease hierarchies were added. Moreover, the data is converted to a predefined RDF model. Finally the data was loaded into a RDF data store (Virtuoso) and queried using SPARQL. The queries were wrapped by an intuitive user interface. Due to the semantic enrichment of the data during the ETL conversion, outstanding search and overview features could be provided to the end-users.

1. Introduction: 

  • The initial problem will be described in detail: Generation of overviews on business relevant data was time consuming due to the distribution of information over various systems and the use of non-standardized strings to describe various types of concepts.

2. Approach and IT solution

  • Integration of data from relational databases using an intermediate RDF to express information
  • Unified Views module
  • Normalization and mapping of entities to a curation model
  • Geo-linking of location data
  • Conversion to a predefined RDF model
  • Description of the final RDF model
  • Wrapping of queries in an intuitive interface

3. Demonstration

  • -Brief overview on various functions of the application

4. Discussion of success criteria & obstacles for the project

5. Outlook & Conclusion


 

CV

Fabian studied physics in Bremen with specialization biophysics and theoretical physics. After his Diploma in 2005 he did a Ph.D in Biological physics in 2008, followed by two post-docs, where he specialized on Biophysical methods and data analysis. In 2013 he switched to Roche where he works on the development of in-house applications with use emerging technologies such as Machine Learning, Text Mining and Semantic Web.

Data Scientist