Miroslav Liska (Marek Šurek)

Industry

Toward Government Linked Data: A Slovak Case

This presentation aims to share and discuss actual status of establishing semantics standards for government data in Slovak Republic. The initial requirement was created by our company Datalan [1] when we became a member of a data standardization process realized under the Ministry of Finance of the Slovak Republic [2] in 2013. Four year we were developing several semantic web based prototypes and it has been concluded that the necessary condition to help to spread the semantic web technology in Slovakia and benefit from it is to have government linked data. During the initial time we developed many custom OWL ontologies. Nevertheless as time went we started using more standardized ontologies such as FOAF [3], SKOS [4], WGS84_POS [5] . It yielded to the first formal proposal of semantic standards for Slovak government data [6]. The approach was based on using only standardized ontologies recommended by SEMICS [7] such as DCAT [8], ADMS [9], REGORG [10], CPO [11] and CPSV [12]. Additionally an URI creation methodology was proposed with the consideration of 10 Rules for Persistent URIs approach [13]. However a problem raised that the vast majority of standardization members did not have any knowledge about semantic web principles, hence we provided several trainings. In that time the adoption process of semantic standards was more science-fictional. However in 2014 the data standardization members jointly approved that a government entity should have an URI. And exactly this became the most important step toward government linked data in Slovakia. Since RDF is based on URIs and is the base framework of the semantic web, then the semantic standards became inevitable. Based on initial mentioned submission, two major properties of semantic standards were constituted. First is it necessary to enlists all applicable ontologies that can be used for government data annotation and at second, it must be established an URI creation methodology for government entities such as individuals, code-lists, datasets and others. 

Consequently we changed our original approach to achieve data annotation only with approved SEMIC ontologies. Rather than make a big jump and use SEMIC recommendations in instant, we started to move actual Slovak government standards to the semantic web. We concerned to the present Catalog of Data Elements [14] which is a simple-informal vocabulary that consists of selected elements such as PhysicalPerson, CorporateBody, PhysicalAddress on others. Since the semantics of these elements were stated only in natural language, the machine reasoning with these kind of data was not possible. So we created an Ontology of Data Elements [15] that replaced KDP. For the sake of consistency the ODP elements are linked to the elements of recommended SEMIC ontologies, for example the odp:CorporateBody is a subclass of the rov:RegisteredOrganization. This states semantic connection between the ODP ontology and the REGORG ontology, hence ODP reaches fifth level of open data [16]. Today a general purpose of ODP is to simplify semantic annotation of data, because KDP is well known. The actual proposal allows that data can be published either with the ODP or SEMIC recommended ontologies. 

However, our development of semantic web applications had not been interrupted even the standardization process was very slow. Along with the development of our commercial products such as Sestate (Semantic Real Estates), Telco Semantic Search (an intranet based semantic search prototype for a large telecommunication operator), Sandman (accountant documents recognition and management) we have been focused also on government linked data. Following the example of data.gov[17], data.gov.uk[18] and dbpedia[19] we created Slovpedia [20]. The Slovpedia is RDF database that consists of selected government linked data. Some data are published for research, some for educational purposes. These data are open. However the Slovpedia also consists of business licensed data for providing various business values. On the one hand, the added value is created with linking external semantic data into Slovpedia and the other hand, the data are enriched with new content (inferred triples). An example is a drug based applications PharmaNet, an application intended for professional pharmacists and the PharmaGuard [21], a mobile application for common user. Initially, the approved drug data are published by the State Institute for Drug Control [22] and by Ministry of Health of the Slovak Republic [23] in various formats. We transform these data into triples with the ODP but also with our proprietary Pharmacy Ontology and load them to the Slovpedia. Then we include the ICD (International Classification of Diseases) Ontology and also DrugBank [24] data. Consequently a set of inference rules are used to infer drug to drug interaction relations. Finally these data are provided by Slovpedia API to mentioned client drug applications. They provide unique advantages such as search drugs by diseases or enrich nlp based drug to drug interactions with the knowledge of DrugBank. Note that the drug data can be be either explored with our product Tripleskop [Tripleskop], a web based visual RDF graph client for a SPARQL endpoint which is the default user interface of Slovpedia.

References:

[1] http://www.datalan.sk/en/home 
[2] http://www.finance.gov.sk/En/Default.aspx 
[3] http://xmlns.com/foaf/0.1/# 
[4] http://xmlns.com/foaf/0.1/# 
[5] http://www.w3.org/2003/01/geo/wgs84_pos# 
[6] https://joinup.ec.europa.eu/sites/default/files/cf/3c/65/20130530-data.g...
[7] https://joinup.ec.europa.eu/community/semic/description 
[8] http://www.w3.org/ns/dcat# 
[9] http://www.w3.org/ns/adms#
[10] http://www.w3.org/ns/regorg# 
[11] http://www.w3.org/ns/person# 
[12] http://purl.org/vocab/cpsv# 
[13] https://joinup.ec.europa.eu/community/semic/document/10-rules-persistent...
[14] http://www.informatizacia.sk/ext_dok-vynos_2014-55_priloha_02_katalog_da...
[15] http://data.gov.sk/def/ontology/odp [URI is not URL yet]
[16] http://5stardata.info/ 
[17] http://data.gov 
[18] http://data.gov.uk 
[19] http://dbpedia.org
[20] http://slovpedia.com
[21] http://pharmaguard.eu [currently in deployment status] 
[22] http://www.sukl.sk/en?page_id=256 
[23] http://www.health.gov.sk/Index.aspx 
[24] http://www.drugbank.ca/

CV

Miroslav Líška received the M.S. degree in informatics from the Technical University in Košice, Slovakia in 2002, and the PhD. degree in software and information systems from the Institute of Informatics and Software Engineering, Faculty of Informatics and Information Technologies, Slovak University of Technology in Bratislava in 2010. His interests include semantic web, ontologies, software process engineering and semantic enterprise architectures. He currently works as an ontologists and semantic web developer in Datalan, in Bratislava, Slovakia. He is a member of the standardization process of Slovak government data where he advocates a semantic based approach. He is married and has two sons. 

Linked Data Architect