Anastasia Dimou, Ruben Verborgh, Miel Vander Sande, Erik Mannens and Rik Van de Walle

Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data Access and Retrieval

The RDF data model allows the description of domain-level knowledge that is understandable by both humans and machines. RDF data can be derived from different source formats and diverse access points, ranging from databases or files in CSV format to data retrieved from Web apis in JSON, Web Services in XML or any other speciality formats. To this end, machine-interpretable mapping languages, such as rml, were introduced to uniformly define how data in multiple heterogeneous sources is mapped to the rdf data model, independently of their original format. However, the way in which this data is accessed and retrieved still remains hard-coded, as corresponding descriptions are often not available or not taken into account. In this paper, we introduce an approach that takes advantage of widely-accepted vocabularies, originally used to advertise services or datasets, such as Hydra or DCAT, to define how to access Web-based or other data sources. Consequently, the generation of RDF representations is facilitated and further automated, while the machine-interpretable descriptions of the connectivity to the original data remain independent and interoperable, offering a granular solution for accessing and mapping data.