Juan Sequeda

Industry

Bootstrapping a Virtual Semantic Data Warehouse

Our customer represents one of the fastest growing organizations in the $30B Multi-level Marketing (MLM) industry. The customer has been managing their business with a relational database solution for over four months that has unfortunately been misaligned with internal data requirements. Among the issues this customer faced with their existing solution were: 

  • Database was unmanageable with over 2500 tables and poor documentation.
  • Database lacked standard integrity constraints (primary keys, foreign keys).


Due to the lack of documentation and understanding of the misaligned solution, the company was not able to generate quarterly business and sales reports. For example, a simple question: “How many Orders were placed in May 2015” meant numerous things to different people and departments within the organization. Is an Order;

  • when the user clicked on “Purchase”?
  • when the billing system came back with a successful transaction?


The problem is not necessarily related to the single database underlying the e-commerce solution. The meaning (i.e semantics) of an Order affects numerous other systems/relational databases throughout the organization. As a result, the semantics of the word “Order” needed to be resolved. We normalized these meanings across data residing in disparate relational databases throughout the organization.

Traditional approaches to data integration involve the creation of a Data Warehouse and Extracting - Transforming - Loading the source data into the new target data warehouse. The problem with traditional techniques is they often employ ETL methods that take excessive time and resources, and include hidden costs that are not apparent at the beginning of a project. A NoETL approach for integrating and searching across enterprise data was utilized for this customer.

The customer acknowledged the need to create a “lingua franca” about their enterprise and was interested in different and innovative approaches to the traditional ETL data warehouse integration techniques. 

The Approach

We approached the problem from two perspectives:
Create an Enterprise Ontology that represents the lingua franca of the enterprise by bootstrapping from the relational database schema.
Map the Enterprise Ontology to the underlying database so questions/queries can be asked in terms of the Enterprise Ontology instead of the underlying database schema. 

In order to be successful, we used the Capsenta Ultrawrap tool. Ultrawrap consists of several parts. The Ultrawrap Compiler tool produces an OWL ontology from a Relational Databases [1], called the putative ontology. The Ultrawrap Mapper tool semi-automatically maps Relational Database to RDF using ontology matching techniques [2]. Finally the Ultrawrap Server tool is a wrapper that virtualizes a Relational Database as a Semantic Web data source by enabling the execution of SPARQL queries over the Relational Database without having the need to physically move the data [3].

The definition of a Lingua Franca implies the creation of an Ontology. An initial step is to look at what are the existing ontologies that could be reused. The two that we referenced were; GoodRelations and Schema.org. However, the needs of the customer were more specific so it was clear that a customized ontology need to be created. In order to create the ontology for the customer, our methodology consists of bootstrapping the Enterprise Ontology from the relational database schema. 

Due to the size of the database, we focused on a smaller subset of the database focused on Orders and Customers, therefore the domain of the Enterprise Ontology would be about Orders and Customers. 

Once the putative ontology was derived from the database, the next step was to expand and enrich the putative ontology with domain semantics that are found within the data but most importantly within the knowledge of Subject Matter Experts (SMEs). An interesting observation is that by enriching the putative ontology into the Enterprise Ontology, we were also creating the mappings from the database to the Enterprise ontology at the same time.

Another lesson learned is that there is no easy answer to the meaning of terms. It is important to start with a hypothesis, extract the data, share the results with others in the enterprise to see if they agree, collect feedback and iterate. 

Benefit of the Semantic Solution

The end goal is to answer business questions. We focused on a set of questions related to Orders. These questions were represented as SPARQL queries. 

By representing the enterprise domain as an Ontology and using the Ultrawrap tool, we were able to provide answers to the MLM organization’s most pressing business questions. Creating a virtual semantic warehouse enables us to answer these business questions in 2 short weeks compared to the ongoing implementation of their misaligned application solution which has been in production for four months. This represent a 400% improvement in speed to value which represent an 8x reduction in resources for the MLM organization. 

Next Steps

The next step is for the customer to keep expanding the Enterprise Ontology to cover a larger domain within the e-commerce solution. The goal is that the Enterprise Ontology will cover the domain of the enterprise at large so that the Enterprise Ontology can serve as a mediator over all the databases. The ultimate goal is to virtually integrate the relational databases within the enterprise by using the Ultrawrap Server tool. 
 

References

[1] Juan F. Sequeda, Marcelo Arenas and Daniel P. Miranker. On directly mapping relational databases to RDF and OWL. In WWW 2012
[2] Aibo Tian, Juan F. Sequeda and Daniel P. Miranker. QODI: Query as Context in Automatic Data Integration. In ISWC 2013
[3] Juan F. Sequeda and Daniel P. Miranker. Ultrawrap: SPARQL Execution on Relational Data. Journal of Web Semantics. 22. 19-39. 2013

CV

Dr. Sequeda received his Ph.D. from the Department of Computer Sciences at the University of Texas at Austin and was a NSF Graduate Research Fellow. His research interests are on the intersection between relational databases and semantic web. Dr. Sequeda is the inventor of Ultrawrap, a patented system which virtualizes relational databases as Semantic Web data sources, which has spun-off into the company Capsenta. Juan was an invited expert to the World Wide Web Consortium (W3C) Relational Database to RDF Working Group and is the editor of the Direct Mapping standard. Juan has received several awards for his research such as 2nd place in 2013 Semantic Web Challenge for ConstituteProject.org and Best Student research paper at ISWC 2014.

Senior Vice President of Technical Sales and Research at Capsenta