Raffaele Palmieri + Vincenzo Orabona

Industry

Semantic Web technologies to increase Web Experience Management during Web Information Portal fruition

Co-presented by Vincenzo Orabona, Enterprise Architect, Eustema Spa 

In this demo, we want to report an application of our semantic CMS to a real case study: the “Intrage” Portal (http://www.intrage.it). In particular, our aim is to show the advantages related to the application of semantic technologies for managing contents within a Web Information Portal. First of all, we present the capabilities of our solution, then we’ll show in which stage of the Content Lifecycle Process we are able to create added value respect to a standard CMS solution and, at last, the measures used to evaluate the results . In the following list, we report the features offered by the framework:

  1. Entity Extraction: extraction of entities belonging to particular dataset from a text;
  2. Classification: classification of structured (for example contained in a database) and unstructured information with respect to a reference model, such as a taxonomy, supported by inference mechanisms, also, and not purely keyword-based;
  3. Relations Extraction: ability to extract relations between concepts reported in a text;
  4. Language Detection: capability to identify the language used in a text;
  5. Document Similarity: given a document or a portion of text, being able to propose similar documents based on their content;
  6. Query expansion: expanding the initial keyword used to search documents, with a series of related terms (extracted from the ontology/taxonomy);
  7. Linked Open Data: links association of extracted entities to public datasets to facilitate the search of documents from the web and for improving SEO ranking.

The Intrage Web Portal (http://www.intrage.it/) was created to help “over 50” people to fulfill all those needs concerning with their life and to have some suggestions about future issues such as: retirement and social assistance. It means that the main topics used within it are: jobs, retirements, health, insurance, taxation, social welfare, homelife, consumer. The ontology, implemented to enhance content metadata, has been represented by a SKOS taxonomy where the “top” concepts were defined starting from the thematic channels already provided (main topics) that were enriched with other concepts provided by the “Nuovo Soggettario” taxonomy, published by the Library of Florence (http://thes.bncf.firenze.sbn.it/). Particular SKOS properties have been used to link the different concepts, together with some references to Wikipedia. Other resources have been prepared filtering NER data from a DBPedia dump and collecting geospatial data from Geonames. They have been used to enable entity annotation and linking process, according to the LOD paradigm. Semantic retrieval facilities are finally available to find contents of interest both for editing and browsing/searching aims. In the following, we summarize all the main advantages derived by the application of semantic technologies to the content management problem for the proposed case study.

  1. To facilitate the content editing task through the suggestion of similar previously published content, during the production step – When new contents are generates some facilities are provided to users suggesting previously published contents with “similar” keywords or topics. Thus, the editor has the possibility to link to other sources and verify the “originality” of what is being produced. 
  2. To improve content annotation through the suggestion of a set of metadata and tags automatically extracted from the content itself – In addition to Named Entity Recognition (NER) utilities, our system provides on the base of a statistical analysis of the text a set of metadata and tags that can be useful to describe a content. This information is then exploited in the indexing stage for creating an index of terms. 
  3. To improve the “visibility” of the published pages by injecting in the HTML code particular tags for search engines – In the Content Editor tool, the annotation results obtained from the text analysis are integrated as microdata within the HTML page, using additional span tags. These annotations are built by reusing the standard vocabulary Schema.org. Since search engines are able to process this additional information, and exploit them in the results they produce, they can apply higher rating criteria for pages containing this data, increasing their visibility.
  4. To provide effective search mechanisms for content produced by both editors and final users, allowing to perform queries according to criteria used in the information extraction stage – During the production of new contents, a tool is provided for users to determine correlated articles in the Knowledge Base. In the search phase, users can browse contents using the index of terms and exploit relations among ontology concepts for improving search results. 
  5. To enhance user experience using content recommendation facilities, based on user profile (favourite pages, subscribed channels, etc.) - All recommendation systems exploit user profiles to provide suggestions about contents related to particular topics, concepts or entities. For the Intrage Portal, content recommendation is implemented in “My Home” section, where user can view targeted recommendation boxes. These are determined by an ad-hoc algorithm, which assigns a particular score to contents of interest on the base of user feedbacks about followed channels, favourite tags, etc.

We have to try to “measure” the introduced benefits for final users derived from the application of semantic technologies to a CMS environment. In particular, measurements have been performed to “quantify” the effective utility of the discussed advantages, defining different types of indicators using a “ five stars” rating model. During the demo we’ll show all these advantages, together with the indicators and their values, used for the results evaluation.

CV

Vincenzo Orabona - Enterprise Architect

Vincenzo holds a degree in Electronic Engineering, and he works for Eustema as Enterprise Architect, within the R&D team, since 2012. Since 2003, when he started working, he was always interested in Research topics. During his job experiences, his knowledge grew up about security concerns, like Public-Key encryption algorithms, GIS systems and their interoperability using OGC standards, Mobile Computing, OSes and APPs development. He learned to work in team, the importance of knowledge sharing and to accept defeats, but also to smile for the wins in the research challenges. Once in Eustema, he spent most of last three years studying the Semantic Web technologies, investigating on W3C standards for data modelling (RDF/OWL), the data querying (SPARQL) and, more generally speaking, about Best Practices and Issues for the “Web of Data”, LOD reusing and foundational-schema ontology learning, comprised. He published some scientific papers and has achieved different ITIL v3 certifications, so as Togaf 9.1 Foundation and Certified (level 1 and 2).

Raffaele Palmieri - IT Systems Architect

He holds Degree in Computer Engineering at “Università degli Studi di Napoli Federico II. He held the Software Architect role in ICT projects, with more than 10 years’ experience in software analysis, design and development. He worked in a company specialized in semantic technologies, where he learned methodologies and techniques for management of complex software development processes and products, applying agile and scrum methodologies. In Eustema he has been involved in several projects dealing with knowledge management, semantic modelling, information extraction and retrieval, and aimed to various enterprise contexts, as Reputation Management, Social Business, User Centric Process Design, Rich Internet Application and Semantic Enterprise. He achieved “Oracle Certified Master, Java EE 6 Enterprise Architect” certification in 2013. He participates to Apache Open Source community for Apache Marmotta project as committer and PPMC member. He was lecturer in SEBD 2014 and DATA 2014 conferences, presenting papers “CMS towards semantic interoperability” and “A Semantic Content Management System for e-Gov Applications”. 
 

System Architect