Start Free Trial
  1. SEJ
  2.  » 
  3. SEO

Understanding Semantic Search and SEO

Understanding Semantic Search and SEO

A framework for semantic analytic centric content

Regardless of what I’ve said about the whole ‘LSI’ and Google crap in the past, one things worth bearing in mind is that all modern search engines do semantic analysis to one extent or another. It may be phrase based, using PLSA, HTTM or a hybrid. That part is really inconsequential. That is important is that we can take heart in the fact that content that is semantically flexible will do a better job of targeting the page in question.

Understanding semantic search and SEO

First off, some common concepts worth looking at; semantic search is NOT semantic web. This is one area that seems to get convoluted all too often. We’re not talking about tagging. We’re talking about the probabilistic/statistical approach to understanding concepts/meanings of a web page/document.

Continue Reading Below

The next thing to try and get away from is that it is only synonyms that play a role within these concepts.

Building out concepts

All too often I see people talking about stemming and synonyms. That’s only partially true. We also want to work on using terms that build out the theme/concept which we might call ‘supporting terms‘. That means we can consider;

  • Automobiles
  • Cars
  • Autos
  • Vehicle
  • Auto
  • Car

Do not be limited to delivering only those signals. We want to go further into creating a deeper theme for that space including supporting terms such as;

  • Engine
  • Garage
  • Tires
  • Hood
  • Spark plug
  • Keys
  • High Performance
Continue Reading Below

And phrases related to or containing them.

As we can see, those aren’t synonyms but supporting words or phrases that further establish the semantic concepts on the page. But we’d likely be more specific in our targeting with additional elements such as;

  • Reviews
  • Sales
  • Rental
  • Insurance
  • Prices
  • Specifications

We can look at transactional and informational modifiers as well. This helps define the type of page that we have. And the type of queries we are targeting. Or for another example some possible terms for; ‘space shuttle’

  • space
  • shuttle
  • mission
  • astronauts
  • launch
  • station
  • crew
  • nasa
  • satellite
  • earth

Getting the picture here?

What we’re looking to do is create a strong semantic theme of what the page is about through the words we’re using to frame it. If one searches for Jaguar‘ they have a few options to choose from,

  • A Car
  • An Animal
  • Football team (US)
  • Computer Application

By using semantic themes you will enable the search engine to better understand the concepts on your page. Remember, search engines have about a 6th grade reading/understanding level. We need to play nice with them.

Building out cocepts around keywords

Elements search engines may look at

The interesting part about using semantic signals/approaches in search is they can give a wealth of information by analysis of such elements as;

Continue Reading Below
  • TITLE of page
  • Content of page (phrase ratios)
  • Prominence factors (Headings, italics, lists)
  • Anchor of inbound links
  • TITLE and content of pages linking in
  • Spam detection
  • Duplicate content detection
  • Personalization

Each of these can be weighted/dampened to give an over-all page relevance score which can then be send to the rest of the processing system. This scoring is based from the current seed set of documents in the system which has a learning mechanism to continually refine the algorithms.

Ranking the pages

Of course the obvious question remains; how are these signals used? In the more common implementations out there machine learning is the call of the day. The search engine would start with a seed set of documents that satisfy a given term/phrase ratio, similarity measure and compare other documents to those for future scoring. Then, using various signals such as query and click data, they can further refine the seed set on the fly.

This would ultimately be combined with other relevance scoring mechanisms and core rankings set to whatever threshold they deem to deliver the end results. While this may not be enough to garner great rankings on their own, they are likely useful to those playing grab and hold via the QDF (query deserves freshness). Any non-link velocity related signal would be at a premium in such cases.

Continue Reading Below


Putting it to use

The first thing we want to do is expand on our keyword research to provide not only primary and secondary targets, but also get into semantic support terms and even semantic baskets. This will be endlessly useful for content development, site audits, link building and more. Given the many signals that can be had, having these concepts integrated into the entire SEO program can be invaluable.

When you do this at the beginning (during the KW research) it can be easily fed into every other aspect of the SEO program.

There really are no tools nor can I imagine one that would work, (although I did talk to the WordStream gang about it recently). But it still is an art more than a science. You see we don’t know the relevance scoring for the seed set and the SERPs are inclusive of other ranking factors. I have found it an interesting excercise to measure occurances on pages ranking top 10, with the least amount of link juice/authority. While not perfect, it oftens brings concept rich pages.

Continue Reading Below

Getting into the mindset

As with many things in this thing of ours, it is something you need to get a feel for in the query space in question. What is important is getting into the habit of watching how you’re framing the content. Build around the core term with not only modifiers (geo-local, informational, transactional, plurals) but also with related terms that expand on the concepts.

Now, before I leave you, I dug up a ton of tools, post and even seminars to get you into the groove. Get a feel for how search engineers think and you will find getting actionable ideas all the more efficient.. I hope you got something from all this, it is an area not often discussed enough.. Enjoy!

/end adventure


Tools to play with

  • Aaron’s tool has some interesting ‘Phrase Match’ data, but it is marginally effective for this excercise and would need sorting.
  • KW Map is interesting, but also is marginally effective and has no export option to speak of. Close, but no cigar
  • Vseo Tool – Also not the greatest, but certainly presents some reasonable semantic concepts and can be exported.
  • WordStream – also comes close, (I am helping develop a tool tho) but nothing default to really group deeper semantic relations for our purposes. Emails the list to you for sorting purposes.
  • Nichebot – these guys almost have it with the poorly named ‘LSI’ tool. This produces probably some of the best lists for our purposes. Fully exportable for sorting.
Continue Reading Below


Googly Tools

  • Keyword Tool – about as use(less?) as the others. It has some insights, but not deep enough for this excercise. Although it is easier to sort and does support downloads
  • Search-based Keyword Tool – not as good as the above KW tool in the testing I did recently for this. It does support exporting though.
  • Google Sets – this one isn’t obvious right away, but handy. If you look at the ‘description’ element, you can start to see some supporting terms that might come in handy (since Googly is recommending them). Problem is that it doesn’t give results for granular/obscure terms.(also try Google Squared)


Semantic relations

  • Onelook reverse dictionary – returns the list of related terms, each word linked to its definition (more tricks from Ann here) – does a reasonable job but doesn’t have export function.
  • reverse dictionary – clusters related terms into groups by their meaning and gives the actual definition for each cluster: barely usable.
  • Rhyme Zone – define your term and find rhymes, synonyms and antonyms. Using the ‘Find related terms’ option you can get some pretty usable lists, unfortunately they are not exportable.


Good Geeky Reading


Google Patents


Microsoft Patents

Videos for Geeks

  • Extracting Semantic Relations from Query Logs Ricardo Baeza-Yates, Yahoo! Research
    In this paper we study a large query log of more than twenty million queries with the goal of extracting the semantic relations that are implicitly captured in the actions of users submitting queries and clicking answers. Previous query log analyses were mostly done with just the queries and not the actions that followed after them.
  • Machine learning and translation – Google tech talks –
    his is an interesting presentation on probabilistic learning and dealing with better understandings of user intent. Kind of heavy lifting for the search geeks, but still worth watching for any SEO.
  • Machine Learning, Probability and Graphical Models Sam Roweis, Department of Computer Science, University of Toronto
  • What’s the future of semantic search? – Matt Cutts video discussing the differences and his take on where it’s going

Subscribe to SEJ

Get our daily newsletter from SEJ's Founder Loren Baker about the latest news in the industry!

Topic(s) of Interest*
By clicking the "SUBSCRIBE" button, I agree and accept the content agreement and privacy policy of Search Engine Journal.

David Harry

Lead SEO Consultant at Verve Developments

David Harry is the lead SEO at Verve Developments and specializes in SEO audits and forensic analysis. He's been in ... [Read full bio]

Read the Next Article
Read the Next