The ideal search engine would be able to match the search queries to the exact context and return results within that context. While Google, Yahoo and Live continue to hold sway in search, here are the engines that take a semantics (meaning) based approach, the end result being more relevant search results which are based on the semantics and meaning of the query, and not dependent upon preset keyword groupings or inbound link measurement algorithms, which make the more traditional search engines easier to game, thus including more spam oriented results.
Here is a wrap up of some of the top semantic search engines which we’ve covered previously, and some updates on their research.
The brainchild of Dr. Riza C. Berkan, tries to anticipate the questions that could be asked relating to a document and uses them as the gateways to the content.
The search queries are mapped to the results and ranked using an algorithm that scores them on sentence analysis and how closely they match the concept related to the query.
Hakia semantic search is essentially built around three evolving technologies:
- OntoSem (sense repository)
- QDEX (Query indexing technique)
- SemanticRank algorithm
- OntoSem is Hakia’s repository of concept relations, in other words, a linguistic database where words are categorized into the various “senses” they convey.
- QDEX is Hakia’s replacement for the inverted index that most engines use to save web content. QDEX extracts all possible queries relating to the content (leveraging the OntoSem for meaning) and these become the gateways to the original document. This process greatly reduces the data set that the indexer has to deal with while querying data on-the-fly. An advantage when you considering the wide swath of data the engine would have to search if it were an inverted index.
- Finally, the SemanticRank algorithm independently ranks content on the basis of more sentence analysis. Credibility and age of the content is also used to determine relevancy.
Hakia performs pure analysis of content irrespective of links or clickthroughs among the documents (they are opposed to statistical models for determining relevance).
The engine has also started using the Yahoo BOSS service and also presents results in a “gallery” with categories for different content matching the query. Users can also request to try out the the incremental changes that are being tried at Hakia’s Lab.
The search company has takes its categorization concept further by providing users with a dashboard of content, aptly called – ” Your guide to the Web”. The company’s focus on informational search makes it suitable for topics when you want information on it rather than look for a particular answer or URL. For example, the search for Credit Default Swap provided a great mix of links, videos and tweets to get me started. Kosmix received $20 million of funding from Time Warner in late 2008. Its content aggregating technology will become more important as content on the web grows.
The image search engine was unique for its host of options to narrow down search based on image size, color and content. Many of these features have since appeared across other image search engines. Exalead is a must try for image search. The company has been focusing on the enterprise search market, essentially attempting to solve the problem of search for content where link analysis is of little help.
The technology powering this engine creates a summary of the top results that are returned for a user query, often negating the need to drill down into the URLs to get the information that one is seeking. Semantic Engines LLC, the company behind the engine provides a variety of products around this technology.
There is Link Sensor, a tool that can be used on major blogging platforms (WordPress, Blogger, etc.)for automatically picking up key concepts from the post and linking them to related articles from the same blog or publisher. It is possible to point to other venues as well, e.g. to another blog from the same publisher – perhaps with a higher CPM. The tool increases user engagement. The company has also started providing APIs for returning summaries of results for a query from a set of URLs that are also passed in as parameters to the APIs. This is one interesting approach that helps save time when an exact answer is what one is looking for.
5. Cognition Search
The Cognition Search NLP Product is a solution companies can use to extract relevant results from their content. The application of this technology could range from better search across the enterprise to fetching more relevant ads. The company provides APIs for access to these technologies. I could not locate the free search, but definitely with the showdown featured on GigaOM, the product has its utility. And it also provides a definite business model.
The Questions and answer search engine uses linguistics to answer the questions that are posed as queries.
Its a good site to try out egosurfing (how popular you are on the web). The engine also provides keywords that represent categories for the results, clicking which takes you to more relevant topics for the query.
Swoogle is strictly for the semantic web. The engine indexes documents developed on the concepts and standards for semantics (such as the RDF Format). : Swoogle
The aim of the engine is to return meaningful sentences for the search query. Its a technique that lies midway between a site summary and summary of all results. For example, the search for gives enough to answer what is the Semantic Web.
After being acquired by Microsoft, the changes to live search were noticeable in the related searches and content returned from Wikipedia. Of course most of the changes would be transparent but definitely in the longer run we can expect more additions to live search.
Overall, any search engine has certain major challenges to make a serious dent in the search space. These include faster return of results, more accurate results for less keywords (three to four words maximum) and more awareness on the side of users.
Most of the problems that face upcoming search engines are not even related to relevancy of results. The appeal of semantic search engines is that the content of a page alone decides its utility. This means lesser spam and of course more relevant ads. It would be harder to game a semantic web engine. Whether a search engine can meet all these criteria continues to remain a question.