Machine Learning

Session 5.4

 
Time: 
Thursday, September 17, 2015 - 14:30 to 16:15
Place: 
D2 Works. 1
Chair: 
Bernhard Haslhofer

Talks

An Optimization Approach for Load Balancing in Parallel Link Discovery

Many of the available RDF datasets describe millions of resources by using billions of triples. Consequently, millions of links can potentially exist among such datasets. While parallel implementations of link discovery approaches have been developed in the past, load balancing approaches for local implementations of link discovery algorithms have been paid little attention to. In this paper, we thus present a novel load balancing technique for link discovery on parallel hardware based on particle-swarm optimization. We combine this approach with the Orchid algorithm for geo-spatial linking and evaluate it on real and artificial datasets. Our evaluation suggests that while naïve approaches can be super-linear on small data sets, our deterministic particle swarm optimization outperforms both naïve and classical load balancing approaches such as greedy load balancing on large datasets.

Unsupervised Learning of an Extensive and Usable Taxonomy for DBpedia

In the digital era, Wikipedia represents a comprehensive cross-domain source of knowledge with millions of contributors. The DBpedia project transforms Wikipedia content into RDF and currently plays a crucial role in the Web of Data as a central multilingual interlinking hub. However, its main classification system depends on human curation, which causes it to lack coverage, resulting in a large amount of untyped resources. We present an unsupervised approach that automatically learns a taxonomy from the Wikipedia category system and extensively assigns types to DBpedia entities, through the combination of several interdisciplinary techniques. It provides a robust backbone for DBpedia knowledge and has the benefit of being easy to understand for end users. Crowdsourced online evaluations demonstrate that our strategy outperforms state-of-the-art approaches both in terms of coverage and intuitiveness.

MEX Vocabulary: A Lightweight Interchange Format for Machine Learning Experiments

Over the last decades many machine learning experiments have been published, giving benefit to the scientific progress. In order to compare machine-learning experiment results with each other and collaborate positively, they need to be performed thoroughly on the same computing environment, using the same sample datasets and algorithm configurations. Besides this, practical experience shows that scientists and engineers tend to have large output data in their experiments, which is both difficult to analyze and archive properly without provenance metadata. However, the Linked Data community still misses a lightweight specification for interchanging machine-learning metadata over different architectures to achieve a higher level of interoperability. In this paper, we address this gap by presenting a novel vocabulary dubbed MEX. We show that MEX provides a prompt method to describe experiments with a special focus on data provenance and fulfills the requirements for a long-term maintenance.