The Financial Times as part of their digital-first strategy made the decision to not chase their consumers on whatever platform they might be using today and trying to predict what they'll be using tomorrow. Instead they took a universal publishing platform-independent strategy. This meant APIs for delivering services, good tools for creating content and good semantic metadata for assets, content and users to connect the two. By augmenting user data and content data with semantics, the FT was able to increase reader engagement, which is a key performance indicator for their 'digital first' strategy.
"Everyone forgets about metadata. They think they can just make stuff and then forget about how it is organised in terms of how you describe your content. But all your assets are useless to you unless you have metadata – your archive is full of stuff that is of no value because you can’t find it and don’t know what it’s about."
-John O'Donovan. CTO of Financial Times
To be able to help the Financial Times achieve this vision, Ontotext pushed the state-of-the-art in RDF-based graph stores in terms of reliability, scalability and availability. This included support for multiple data centres and an AWS implementation. Additionally, the experience taught the company the importance, and often undervalued, of developing the user experience aspects of the technology. The semantic database at the FT is powering their most important B2B solutions. Peak query loads per second are at 50 QPS for reading and 20 QPS for writing over 184 million statements.
Additionally, the Financial Times had the same problem that every other content creator across news, media and publishing. Publishers have a vast amount of unstructured text, which is expensive and difficult to repurpose for new products and services. Ontotext has the unique ability to offer an enterprise-grade semantic repository, GraphDB that is tightly coupled with Natural Language Processing (NLP) services to enable ontology-aware NLP pipelines and entity disambiguation.
Previously, the Financial Times had basic concept extraction but the software they were using was expensive, difficult to maintain and poorly integrated with the technical stack. Ontotext's Publishing Platform offered an open non-proprietary solution that was tightly integrated with their choice of graphDB. The FT is now able to identify more than seven million named entities, and these entities all have additional metadata to enable 'added value' products such as ticker price information or affiliations. In addition, there are twenty million labels for people, companies and organisations. The coupling of a semantic database with text analytical pipelines via a specially developed plugin enables a high level of precision and recall, which is essential for a new organisation. The concept extraction service cluster scales horizontally to hand peak load times and can handle 10 documents per second at 100% reliability. These achievements are already impressive but the Financial Times had even greater ambitions. Besides semantically enriching content, Ontotext developed a recommendation service on top of the platform. This further drove consumer engagement as the content offered was informed by the user behaviour in addition to the semantic profile of the content, which provided a more holistic and naturalistic user experience. The result is a 100% reliable production API handling 1.5 to 3 million requests per day. Since going live last year, the system has indexed half a million documents and made 200 million recommendations. This is achieved without caching as each request is effectively a personalized search request.
Previously, Semantic technology has been dismissed as too academic with little support for enterprise environment or tools and interfaces that are unfriendly to anyone less than expert in the Semantic Web. The successful adoption, implementation and production use of semantic technology at the Financial Times is demonstrating that the technology is becoming an essential part of the technology stack in news and publishing. Jem Rayfield, the Head of Solution Architecture, confirms this observation, “…editorial and business [sides] have seen the benefits of the [semantic] approach and they want to follow it.”
joined Ontotext to strengthen the positions of the company in the educational and publishing verticals. He is extremely enthusiastic about the ways in which semantic technology could be harnessed for the sake of better and more efficient learning and teaching. Ilian has more than 15 years of experience in the fields of publishing, education, e-learning, contemporary ways for management and presentation of cultural heritage. Prior to Ontotext, Mr. Uzunov founded Sirma Media in 2002 and in 2008 spun it off as an independent company. Ilian managed to position Sirma Media among the major players in the fields of e-learning and modern presentation of cultural heritage in Bulgaria, accomplishing some of the most innovative and challenging nation-wide projects in these two domains. Mr. Uzunov earned his Master’s degree from Sofia University in “Contemporary methodologies for teaching and learning English language with strong use of ICT” and later undertook an MBA programme in Executive Management from BEIED.
Dr. Georgi Georgiev has specialized in advanced text analytics with a focus on machine learning based models and overall software architectures and solution methodologies for enterprise level semantic annotation and search solutions. Georgiev leads the text analysis product development and professional services at Ontotext and manages semantic publishing projects for organizations such as the BBC and PressAssociation. He has a solid leading role in the Semantic Annotation and Search division of Ontotext and a strong commitment to combine scientific and research excellence with productive industrial and government oriented solutions. His interests include machine learning product development and adaptation, team leadership, enterprise software architectures, and contemporary management techniques.
Sales Director CEMEAA