What’s exactly relevance? Google finds the following explanation of the term:
Relevance describes how closely the contents of an information source match the topics. [source]
Obviously, the definition is rather vague but I wasn’t able to find anything more exact or specific (if there is one).
Ironically, relevance has become the most important term in search engine marketing – it determines both the site and search quality. Each time when a search marketer wants to know if his tactic is legit, Google says, “Yes, as long as you keep things relevant.” So where is the line? And even more interesting, how can a machine define it if even we, humans, fail to?
No wonder that the main factors taken into consideration when developing Google Sets were:
- punctuation (=> words separated by commas must be relevant, i.e. by forming a list);
- HTML tags (words within same (sets of) HTML tags – e.g. <li>, <h6>, etc – might be relevant).
Really language and HTML signs are much easier for the machine to understand than actual words. The big issue is that such a complex and organic structure as language is very difficult to put under the set of some definitive rules a machine can be taught.
Historically the main factor a search engine considers ranking search results is a keyword. Recently search engines (read: Google) have become really sophisticated at understanding what’s relevant. A recent discussion at Google Groups prompted me to try to understand what the machine might evaluate when making the decision if the content is relevant, natural and useful. A member named MrGamma expressed a really interesting point of view:
I guess Google is discerning quality content by what variations of the search term are present on a page… Maybe they will extract the quality of content based on the unique meaning of the words when compared against the search term?
So which semantic criteria might a search engine be theoretically using to define the text relevance?
- Keyword density and prominence: I am not talking about the keyword density in its traditional understanding (i.e. mere percentage) – I myself encourage people not to focus on it any more. My point is that, a search engine might use the metrics to define what’s the text is about but it must have recently become much cleverer at telling if the keyword density was created artificially or not (for example, it might be considering some artificial keyword clusters, words used too close to each other, etc).
- The usage of the keyword equivalents, i.e. pronouns, words used to avoid unnecessary keyword stuffing and making the text easier to read.
- The usage of synonyms – which make the content more informative, rich and natural but still on topic.
- The usage of several words with the same root as the keyword, e.g. [optimize], [optimization], [optimizer], etc.