The SemaGrow project develops a Linked Data infrastructure that allows transparent access to distributed heterogeneous and constantly updated large datasets. The developed innovations are delivered as the SemaGrow Stack, an open source software package. Through the SemaGrow Stack applications can access heterogenous, distributed triple stores using a single SPARQL endpoint, without having knowledge of the underlying schemas of the individual sources. To prove its practical value, the SemaGrow Stack is tested in data and knowledge intensive use cases from the agro-environmental domain.
The Open PHACTS project ( http://www.openphacts.org/) has built a platform for drug discovery that integrates data over diverse sets of public chemistry and biological data. It currently connects linked open data from 12 different data sources, including chemical compounds, protein targets, biological pathways and tissues, and diseases. The diversity and size and of the Open PHACTS data are growing rapidly, and it contains currently more than 3 billion triples. The Open PHACTS project is a unique collaboration between European academic groups, small businesses and large pharmaceutical companies, partially funded by the EU. The driver for the project is to enable scientists to easily access and process data from multiple sources to solve real-world drug discovery problems that were very difficult to solve before. These drug discovery problems formed the basis for selecting what public data sources were integrated in the Open PHACTS project. Anyone can freely access the Open PHACTS data through a well documented API, and numerous workflows to answer specific biomedical questions have been developed and published using the KNIME and Pipeline Pilot pipelining tools. In addition, several custom applications have been built using the API. Open PHACTS has shown that Linked Open Data in the form of RDF triples can be used effectively by the scientific community, and allows queries that were previously very difficult or impossible to run. Future directions include the integration of additional public data sources, integration of internal company data with Open PHACTS data, and the continued development of workflows for scientific questions that can only be answered using linked data.