Angelos Charalambidis, Antonis Troumpoukis and Stasinos Konstantopoulos

SemaGrow: Optimizing Federated SPARQL queries

Processing SPARQL queries involves the construction of an efficient query plan to guide query execution. Alternative plans can vary in the resources and the amount of time that they need by orders of magnitude, making planning crucial for efficiency. On the other hand, the construction of optimal plans can become computationally intensive and it also operates upon detailed, difficult to obtain, metadata. In this paper we present Semagrow, a federated SPARQL querying system that uses metadata about the federated data sources in order to optimize query execution. We balance between a query optimizer that introduces little overhead, has appropriate fall backs in the absence of metadata, but at the same time produces optimal plans in as many situations as possible. Semagrow also exploits non-blocking and asynchronous stream processing technologies to achieve query execution efficiency and robustness. We also present and analyse empirical results using the FedBench benchmark to compare Semagrow against FedX and SPLENDID. Semagrow clearly outperforms SPLENDID and it is either on a par or much faster than FedX.