In order to deal with strategy marketing and competitive intelligence, industries need to monitor the Web to gather and make sense of such a large amount of information. SMILK is a Joint laboratory between INRIA WIMMICS and Viseo R&D at the crossroad of three research domains - natural language processing, linked data, and social networks analysis - to support decision processes in Industry. Our goal is to develop research and technologies to: (a) retrieve and analyze textual content from the Web (b) represent and link results with linked data sources and semantic Web formalisms (c) include social network analysis results to augment available analysis criteria (d) reason above the integrated data in order to improve the analysis and understanding of the accessed information sources.
To achieve this goal, in SMILK we study means to strongly couple algorithms and linguistic models at a semantic level, the extraction and the disambiguation of the knowledge guided by Web resources and various kinds of reasoning (logical inferences, approximations and similarity, etc.) to support users in their explorations and analysis.
As a result, we will demonstrate our first prototype integrating several research results in a plugin augmenting the browsing experience of a user. The purpose of this prototype is to show how we can improve the users’ knowledge and understanding performing real-time Natural Language Processing on accessed content and integrating the results with Linked Open Data on the Web and Social Networks analysis to enrich the page visualized
First, the demo will show how it is possible to identify relevant entities according to user's interests and how to structure the related data using a fine-grained linguistic analysis (information extraction, disambiguation). More precisely, our system uses several linguistic rule-based modules applied over a morphosyntactic representation of the text. We will highlight the difficulty to recognize some named entities. For instance, names of products and brands are often ambiguous (e.g. “N°5”) and syntactically complex to distinguish from normal text (e.g. "La vie est Belle" “literally Life is Beautiful”, "La petite robe noire" literally “the little black dress”). We will explain how we are able to address these problems in real-time. The second part the demo will show how we link the entities recognized in the text. For example, based on the previous step where the system recognized "N°5" as a name of product and "Chanel" as a brand name, the system uses linguistic rules in order to identify the relation "belongs to" between "N°5" and "Chanel".
The third part of the demo addresses the entity linking task to link data appearing in the text with data obtained from Web knowledge bases An important point here is that when no data is available in public knowledge bases (e.g. new names of products), we support linking with our own structured data acquired while browsing the Web (e.g. from journalistic articles, social networks, etc.). Finally, knowledge from social networks is also imported and linked with other data to provide the users, opinions and key ideas linked to a product or a brand name.
The prototype fully integrates the four functions previously defined and is applied to in the domain of Cosmetics. Concretely, our prototype takes form of an easy to install plugin augmenting the browsing experience and we will demonstrate each function by browsing pages about cosmetics.
Cédric Lopez is a researcher at Objet Direct (VISEO), member of the Research and Development Unit since October 2012. His main research interests at Objet Direct are Natural Language Processing, Text Mining, Information Retrieval, Terminology, and Artificial Intelligence. He received is PhD from LIRMM (Laboratory of Informatics, Robotics, and Microelectronics) of University of Montpellier, France. He is the author of many international publications and acted as reviewer of different conferences, chaired the NLP conference for students and young researchers (RECITAL'2011).
He worked on several collaborative research projects such as LEILAS (FUI), SYNODOS (ANR), SMS4SCIENCE (MSH-M, DGLFLF), TIER (FUI), and SMILK (ANR).
Researcher on subject Data Analysis