A common problem in large and complex organizations is the distribution of information over various systems. This hampers overview creation of business relevant data. Typically, this issue is further complicated by the problem, that often non-standardized strings are used to describe concepts (i.e. external cooperation partners or diseases are expressed in various ways). In the context of clinical cooperation’s, we developed an application to solve these issues. First, we integrated data from several relational databases with different table schemas. To become independent of the source system, we converted the data to simple RDF triples containing the information from the different relational databases. Subsequently, this intermediate RDF was processed in Unified Views, a custom ETL (Extract Transform Load) tool, specifically build for RDF - ETL tasks. Within Unified Views, spelling variants were normalized and further information such as institutional or disease hierarchies were added. Moreover, the data is converted to a predefined RDF model. Finally the data was loaded into a RDF data store (Virtuoso) and queried using SPARQL. The queries were wrapped by an intuitive user interface. Due to the semantic enrichment of the data during the ETL conversion, outstanding search and overview features could be provided to the end-users.
Fabian studied physics in Bremen with specialization biophysics and theoretical physics. After his Diploma in 2005 he did a Ph.D in Biological physics in 2008, followed by two post-docs, where he specialized on Biophysical methods and data analysis. In 2013 he switched to Roche where he works on the development of in-house applications with use emerging technologies such as Machine Learning, Text Mining and Semantic Web.