Semantic Search

Xerox’s FactSpotter : Mining Concepts in Text

Xerox’s FactSpotter isn’t a mainstream search tool that will debut as a challenge to present day search engines. But the features packed into it may well pave the way for an ideal search tool that is built on the concept of connecting data rather than collecting data.

Collecting data is what mainstream engines do today – look for links ranked via statistical or mathematical models and push them to users.

The folks at Xerox intend to launch FactSpotter as a text-mining tool that will capture the concept, context and keywords in a query for returning relevant results.

For example, searching for ” Steve Jobs speech yesterday” will return his speech yesterday as top result rather than links with his audio recordings or images or any other data at any other time. It’s the technology of capturing the concept (in this case the concept of time) from keywords and matching it with underlying grammar in query to return near perfect results that makes FactSpotter a relevant contender in the semantic space.

Other salient features are:

  • Recognition of concepts such as ‘buildings’ and ‘people’.
  • Search documents in multiple languages.

Prospects

The makers have mentioned that the engine is based on connecting data, which alludes to relying on a semantic framework for the web that provides the data.

This is one reason why I feel FactSpotter may not only be a very relevant solution for document management (Xerox plans to launch it first for Litigation firms to manage the swath of documents), but may also pioneer technologies that will inherently utilize the semantic framework that will power the web of the future.

The Semantic web is a concept of creating a web where the data is described in a manner that machines can understand. The result will be the ability to design web software that mixes data in various formats streamed from various sources without any impediments. And FactSpotter is one among the next generation technologies to look out for.

You Might Also Like