Marco Fossati, Dimitris Kontokostas and Jens Lehmann

Unsupervised Learning of an Extensive and Usable Taxonomy for DBpedia

In the digital era, Wikipedia represents a comprehensive cross-domain source of knowledge with millions of contributors. The DBpedia project transforms Wikipedia content into RDF and currently plays a crucial role in the Web of Data as a central multilingual interlinking hub. However, its main classification system depends on human curation, which causes it to lack coverage, resulting in a large amount of untyped resources. We present an unsupervised approach that automatically learns a taxonomy from the Wikipedia category system and extensively assigns types to DBpedia entities, through the combination of several interdisciplinary techniques. It provides a robust backbone for DBpedia knowledge and has the benefit of being easy to understand for end users. Crowdsourced online evaluations demonstrate that our strategy outperforms state-of-the-art approaches both in terms of coverage and intuitiveness.