The SemaGrow project develops a Linked Data infrastructure that allows transparent access to distributed heterogeneous and constantly updated large datasets. It aims to tackle this challenge by developing novel algorithms and methods for querying distributed triple stores, scalable and robust semantic indexing algorithms and effective ontology alignment. The developed innovations are delivered as the SemaGrow Stack, an open source software package. Through the SemaGrow Stack applications can access heterogenous, distributed triple stores using a single SPARQL endpoint, without having knowledge of the underlying schemas of the individual sources.
To prove its practical value, the SemaGrow Stack is tested in data and knowledge intensive use cases from the agro-environmental domain, where aspects like the large heterogeneity of datasets, their often explicit spatial and temporal dimensions resulting in relatively large volumes and their inherent nature of uncertainty provide additional challenges which are not usually dealt with till so far. Pilot applications vary from access of bibliographical, statistical and multimedia sources by data scientists and educators to querying and integration of distributed Big Data resources for agricultural modellers.