• Home
  • About
Blue Orange Green Pink Purple

Reuters embrace the Semantic Web with Calais

Posted in Uncategorized. on Tuesday, February 26th, 2008 by Vincent Maher Tags: calais, reuters
Feb 26

In probably the most strategic move by any media company, more so than Rupert Murdoch’s purchase of MySpace, Reuters have launched a new project called Calais that has the following as its mission statement:

We want to make all the world’s content more accessible, interoperable and valuable.

What is Calais?

Calais is a web service that converts text objects like news articles into RDF triples. Sound like a load of geek nonsense? RDF triples are essentially an XML representation of subject – predicate – object relations.  Vincent Maher (subject) knows (predicate) PHP (object).  In short, developers can send a piece of content, like a news article for instance, to the Calais web service and receive back the same content annotated with triples which can then be used for other things.

An example:

By submitting this article using this demo tool, Calais returns the following tags:

  1. City: New York
  2. Company: USA Today, CBS News/New York Times
  3. Country: United States, Islamic Republic of Iran, Cuba, Kenya, North Korea
  4. Person: Howard Wolfson, David Plouffe, Barack Obama, Hillary Clinton, John McCain
  5. ProvinceOrState: Illinois, Texas, Ohio

Immediately there is immense value in being able to tag content automatically and be able to distinguish between different types of tags.  Calais then also returns the RDF which contains embedded semantic information in the body of the article which cane then be used by browser plugins like Piggy Bank and Semantic Radar to store or do something with this information.

Imagine the following scenario: you are a researcher and you have a plugin for your browser that allows you to store knowledge units like addresses, people, things the people have done, places the people have been to, values that the people have acted according to, people those people know, relationships between them and all sorts of other stuff.  In other words you have a blank encyclopedia on your desktop that you can add to at any point. As you surf around, looking up things on Google or reading the news you find bits of information that you want to keep.  Instead of simply copying the paragraph of information into a document and keeping a link, with this type of semantic data you can save information like person x is the CEO of y company and person x lives at y address and works at z address.  As you go on you can build a knowledge base of semantic information that can be used contextually.

One of the other key differences is that documents with semantic data can be indexed semantically.  This means that you could say you are searching web pages that contain formation about person x  but specifically the address he/she works at.  Obviously Google can come close to giving you this information today, the difference is that Google has to deduce this by doing syntactic analysis of the text rather than receiving explicit instruction how to understand the information and its meaning.

So why is this good for Reuters?

One can argue that the news media should be a major part of the Semantic knowledge backbone, along with government and academic institutions, because of the sheer volume of information they collect and disseminate every day.  The problem for news organisations in general is that this type of service and its creation is way out of their usual areas of expertise and core business.  To do this you need very specialist skills and a very long-term view of the future because there are no obvious revenue models for the service except on a subscription basis.

What Reuters get out of this project is very important for their future however.  Firstly they keep a copy of whatever is submitted to them, for further semantic analysis.  Secondly, they ask you to use their GUIDs (Globally Unique Identifiers) so that when your users look up more information based on the story or the embedded triples, Calais provides the information.  Thirdly, and most importantly, Calais will quickly become a massive data-store for semantic knowledge that can then be redeployed within Reuters either to support journalists or to act as a repository of knowledge.

As Google have discovered, owning the source of knowledge in our current society has certain advantages, among which is the ability to launch virtually anything else off the back of it.  This is a big-time investment in a future that is not all clear, but I suspect it will pay off more than anyone in Reuters currently imagines.

blog comments powered by Disqus

Vincent Maher

  • the short bio
    Vincent Maher is the portfolio manager for social media at Vodacom, South Africa's largest mobile telecommunications company. His flagship product is The Grid, a fast-growing location-based social network and instant messaging platform. Previously he was the strategist at the Mail & Guardian Online and co-founder of Amatomu.com, the South African blog aggregator and analytics system. Before that he was Director of the New Media Lab at the Rhodes University School of Journalism & Media Studies, the managing director of Digital Commerce and a multimedia director at VWV Interactive.

    He has worked in the online media industry since 1996, has presented papers at many international conferences and specializes in profitable innovation in emerging markets.

    View Vincent Maher's profile on LinkedIn

  • Syndication
    RSS Feed RSS for this blog

    Learn more about syndication, feeds, and feedburning.

  • Archive
  • Search






  • Home
  • About

© Copyright Vincent Maher. All rights reserved.
Designed by FTL Wordpress Themes brought to you by Smashing Magazine

Back to Top