Why Search Is Broken, and How Semantic Search Fixes It

SMS Text

A human gets information by asking a question to get an answer, but online, we’ve been forced to learn “keyword” searches.

The thinking was that we could extract meaning from several abstract words (aka keywords) most closely related to what we were seeking. The problem is that this does not work in getting a real answer. As we’ve started to see with new developments in search techniques from EyePlorer, Wolfram Alpha, and even Twitter’s value in search, the market is aggressively looking for a more meaningful approach. To boil it down, the major issues with the keyword search model are:

  1. Results are not completely relevant to the original query
  2. Lack of accuracy leads to an overabundance of results
  3. Too time consuming to comb through that much information

Let’s look at an example of how search works today. Search for “best cat food,” and you’ll get more than 93 million pages including these keywords, prioritized using the secret sauce of the search engine.

search broken

While looking over the universe of information around best cat food, I wonder:

  1. Have I figured out what information is out there? Am I able to match the resulting content to a user’s request?
  2. Did I really need 93 million results, or just the right information? Is there an accuracy issue?
  3. If I have to then analyze even 1% of the information, is there a lack of understanding in what I’m looking for?

We should not have to do a “search”… on our search results.

In order to understand what a user is really looking for, we need to use a system where you ask real questions to get real answers. We need to use commonsense reasoning, looking at natural language processing and the inflection and semantics. If a system looks at all of the information available and digests its full meaning, it can take that semantic understanding and match it to the users’ meaning to produce results that make sense – the first time.

In addition to new companies, several university projects are taking on this concept as well, like the University of Maryland’s SHOE Search Engine and University of Maryland-Baltimore Campus’ Swoogle. However, both are still a ways off from producing the true meaningful experience that the user will come to trust.

Until we fix the core – the true understanding behind why and how we search – we’re left with 93 million links about cat food.

As the Chief Technology Officer for Dorthy.com, produced by Saber Seven, Jim Anderson leads the company’s search, content delivery, social and mobile strategies, as well as technological vision. The mixture of his experience, with leading advances in Artificial Intelligence, Natural Language Processing, Machine Learning Technologies, and his hundreds of patents have all been fertile training for his new challenge – creating Dorthy.com as a premiere web destination, helping users to find real answers to their searches.

Get the latest news from Search Engine Journal!
We value your privacy! See our policy here.
  • As someone who knows how easily the system is gamed, I’m not sure I want search engines to be much smarter. Think about it.

    Every time you search for a product, ie “best cat food”, you let a machine make a decision for you. That machine doesn’t know anything about cat food, nor has it sampled even 1 brand of cat food.

    Yet, most people trust Google implicitly (roughly 43% of searchers automatically click the first result). Should we give that machine more power over our decision making processes?

  • I understand what you mean Josh but wouldn’t you rather the results be more accurate to the point that the “best cat food” might actually be the best food or at least the food considered to be in the top 5 or something?

  • alienbinary

    I think this is very validating for developers, frustrated with explaining why there is no magic solution to getting to the top of search rankings. While we all clearly feel that our content is the most relevant, there are hundreds of other people who are probably thinking at the exact same moment that they know beyond a shadow of a doubt that their content is the absolute most relevant and important page on the web. Meanwhile, there are a thousand other users at the exact same time who acknowledge that the keywords they’ve injected into their sites, either through meta tags, index spamming, alt tag tweaking or what have you, are comfortable so long as people visit their sites.

    What we should take away from this as users, however, might not be to make search engines more discriminating, but to encourage users to be more active in their quest for whatever it is that they’re looking for. No one walks into a library and reasonably expects that by saying a handful of words to the librarian, he or she can produce the most relevant text from the vast collection, otherwise there would be no need to peruse the stacks. Ultimately, the librarian’s job is to point the reader in the right direction and help them along the way. As a medical researcher, I don’t expect information to reveal itself to me unless I can find the best way to conjure it up. Simply put, there are no shortage of ways to tweak a google search. With a built-in language that allows savvy searchers to toggle even between respective filetypes and within specific domains, even going so far as to search a specified region of the document only, there are many ways to get what you need and fast.

  • PDB

    One semantic search startup to watch is Swingly. There’s an interesting blog post on the Ft. Worth Startup Blog that talks about their goals — very similar to what is espoused here.


  • First, we forget that the human is an analog being. This means that we “slide” into meaning, rather than obey a bifurcational calculus.

    In addition, much of the good information is in what we DON’T expect. This is where the creativity is–new connections. Perhaps we could spend some time developing our vocabulary and concept formation faculties, like flexing muscles, rather than expecting a binary machine to “help” us.

    Google ‘orthomentoring’ for example. Is there meaning in the word? What is its uniqueness score and frequency? And what other words triangulate on it or connect?

    Morale: Purpose of the mind is to grasp and make meaning. For this, we need richness and a little work, not predetermined and value-corrupted answers. When we mix in the analog machines, there will be improvement.

    At least I think so!

  • gdog


    what is the best cat food. who can answer. NoW!

  • Twitter is now officially overhyped on the Webmaster T scale of hype! Twitter if used as search would not work quite simply because it would be choked with spam. Realtime spam but… spam nonetheless! You’re talking about scaling MickeyMouse services into Mighty Mouse services! Ummm gonna almost surely be degradations in the services…1 good result is all I need… Google has not failed me in literally years.

  • I agree that 93 millions results are not needed. The engines need to do a better job of understanding what is being written on the web and then translating that into more targeted and customized results for the user’s search.

    But when you think about the query “best cat food” is there really an authoritative or correct answer? No. There could be thousands of relevant opinions from cat owners around the world on the subject and are thousands of results any better than 93 million? It will still take too much time to sort through all of that information. It comes down to allowing the user to filter and manipulate the results more easily to find what they need.

  • Thanks a lot for mentioning eyePlorer.com in this interesting article. By the way, this is what eyePlorer.com knows about “cat food”: