The Google Hilltop Algorithm
Based on Atul Gupta’s great article he recently wrote on the Hilltop algorithm, I did a bit of research on my own and came up with this article. Atul Gupta is the founder of SEO Rank Ltd. and, as he explained it in his article, the Hilltop algorithm played a fairly large role in Google’s November 16 update, dubbed “Update Florida”.
In my continuing series on the effects of the Google “Florida Update”, in my previous article, I discussed how the OOP (Over Optimization Penalty) could in some cases have been applied to certain sites that could have in fact been overly optimized on some of their main keywords. Researching and reading on the Hilltop algorithm, I found out that it isn’t even new- it dates back to early 2001.
As you might expect, and as is always the case, Google remains very silent on any of this, so my analysis is based on many observations and some testing, using the Google.com search engine. But before delving into how all of this may affect your positioning in Google, let me explain what the “Hilltop” algorithm is all about and how it works in Google.
For those of you that may be new to search engine algorithms, I suggest you read on Google’s Page Rank™ algorithm, as a primer, and also “Anatomy of a large-scale hypertext search engine”, written by Sergey Brin and Larry Page, the co-founders of Google.
In its most basic form, the Google PageRank™ algorithm determines the importance and the relevance of a website by the number of links pointing to it. Following this principle, as an example, Google would rank a page higher if it has 100 links pointing to it, when compared to another page with only 10 links. So far, so good and this principle makes a lot of sense when you think of it.
Definition of the Hilltop algorithm
In contrast to PageRank™, Google’s Hilltop algorithm determines the relevance and importance of a specific web page determined by the search query or keyword used in the search box.
In its basic, simplest form, instead of relying only on the PageRank™ value to find “authoritative pages”, it would be more useful if that “PR value” would be more relevant by the topic or subject of that same page.
In such a way, computing links from documents that are relevant to a specific topic or relevant document of a web page would be of greater value to a searcher. In 1999 and 2000, when the Hilltop algorithm was being developed by engineer Krishna Bharat and others at Google, they called such relevant documents “expert documents” and links from these expert documents to the target documents determined their “score of authority”. Again, it does make a lot of sense.
For more in-depth information on this important topic, read the Hilltop Paper that was written by Krishna Bharat himself and is available from the University of Toronto’s computer science department.
Using the Hilltop algorithm to define related sites
Google also uses the Hilltop algorithm to better define how a site is related to another, such as in the case of affiliate sites or similar properties. The Hilltop algorithm is in fact Google’s technology and ‘ammunition’ in detecting sites that use heavy cross-linking or similar strategies!
As a side note, Google’s Hilltop algorithm bases some of its computations mostly from “expert documents”, as noted above.
Hilltop also requires that it can easily locate at least 2 expert documents voting for the same Web page. If Hilltop cannot find a minimum of 2 such “expert documents”, the results it will return will be absolute zero. All of what this really means is that Hilltop actually refuses to pass on any arbitrary values that may be relevant to the rest of Google’s ranking formula and thus becomes inappropriate for the search term or keyword used in the search box by the user.
So, what’s in store for Hilltop in 2004?
Since we are only at the beginning of the year, some of you may ask: “That’s all really cool, but what will happen to websites in 2004, in the aftermath of “Hurricane Florida”? That’s a great question, and many articles have been written on this topic in the last six to seven weeks.
Today and in the past, many search engines stopped valuing certain search factors subject to abuse from certain webmasters or site owners, such as keywords meta tags. For that reason alone and since its very beginnings, Google has always completely ignored meta tags altogether in the first place.
In contrast, visible sections of a website are less subject to “spam-dexing” (search engine spam), since these ‘visible pages’ (!) need to make good sense to the average human “real” visitor.
The reasons behind a new algorithm at Google
Since the inception of the Google search engine in 1998, the PageRank™ algorithm has been pretty much the benchmark used at Google to determine search relevance and importance. However, there is a fundamental design weakness and certain limitations involved in the PageRank™ algorithm system and Google has known about it for quite some time now.
PageRank’s ‘intrinsic value’ is simply not paramount to search terms or specific keywords and therefore a relatively high PR web page that only contained a reference to an off-topic search term or keyword phrase, often got a high ranking for that search phrase. This is exactly what Google is trying to eliminate with its Hilltop algorithm. Google always tries as best as it can to make its search engine as relevant as possible.
Coming back to Krishna Bharat, he filed for the Hilltop patent in January of 2001, with Google as an assignee. Thus, Google recognized the important improvements this new algorithm could bring to their search ranking features when combined with their existing PageRank™ algorithm.
Google’s Hilltop algorithm could now work in conjunction with its older technology (PR). It is my observation that Hilltop could have gone through many improvements from its original year 2000 design before the current implementation, notably the one that Google started to deploy on or around November 16, 2003, at the very beginning of its November update (Florida update).
In the past two years, I think that Hilltop has been “fine-tuned” by Google and now represents a serious contender to the PageRank™ algorithm, originally developed by Google co-founders , back in early 1998.
Sergey Brin and Larry Page
Hilltop and Google’s massive index of over 3.3 billion pages
Since its very beginning, Google has basically been operating most of its search engine through about ten thousand Pentium servers (some call them inexpensive personal computers), evenly distributed mostly through some major data centers located anywhere on the planet. That is basically how Google has built its hardware technology, from the ground up.
Coming back to the Hilltop algorithm, if we make an observation on how about 10,000 servers can have the dynamic processing ‘intelligence’ to rapidly determine and locate “expert documents” from hundreds of thousands of different and ‘topical’ Web pages, it is clear that Google’s Hilltop algorithm is at work in such a formidable task.
From what I can see and from what I know of search engines, since November 16, Google is now running a form of batch processing (similar to the mid-seventies days of computing, using bulky mainframe computers the size of large refrigerators, except that today, those 10,000 servers replace those mainframes) of frequent keywords, key phrases and search terms. Google then stores these results in its massive database, ready to be used as soon as a searcher makes a query using those search terms.
How Google does this is very simple: it has immediate access of the most popular and frequent keywords used and in the search terms used daily from its large database, and in real time, collected from actual searches used by everyday users, as well as actual keywords and key phrases used in its AdWords PPC (pay-per-click) ad program.
It is my observation that Google has apparently set a certain arbitrary threshold value to the actual number of searches a real-life search keyword needs to have in practice before it triggers a set limit in the Hilltop algorithm, and is then sent to a temporary buffer for later batch processing in its whole complex system.
Looking back to the ‘old days of the monthly dances’, it would appear that Google’s Hilltop algorithm operates on the combined total of most popular search terms used once a month, hence the old “Google dance effect”, prior to November 16, 2003.
Additionally, and this is something I have noticed even before the Florida update, incremental and smaller bits of batch processing is likely being done more frequently by Google on certain search terms that increase in popularity much faster, such as a major news event, for example when the US captured Saddam Hussein in December 2003. Such short-term events or news would qualify for the short-term “buffer” and would be processed as such by Hilltop.
More ‘standard’ and ordinary results for the longer term would be timed in with the 10,000 servers about once a month, which again, would make perfect sense. Search terms that do not qualify to kick in the Hilltop algo continue to show you the old Google ranking.
In concluding this topic, as Atul Gupta and myself have written in some of our previous articles, webmasters and site owners need to think “out of the box” if they want to thrive and continue to have sites that return favourable ROI’s. As always, link popularity is even more important now than ever before.
Additionally, try to get a listing in as many directories as possible, beginning with DMOZ (the Open Directory Project). Avoid FFA (Free for All) or link farms in every respect. Those are a thing of the past and might even get you penalized.
If your budget allows it, get into a good PPC ad program, such as AdWords or Overture. You might also want to consider some good paid inclusion search engines that deliver real value for your investment.
Note that since January 15, (and as expected) Yahoo has completely dropped its listings with Google, so you may also want to look at the possibility of a paid listing in Yahoo as a safety measure. Yahoo is now taking its results from Inktomi, which is also in the Yahoo family of search properties, since Yahoo bought Inktomi last year.
Copyright (c) 2004. Serge Thibodeau. All rights reserved.
Permission granted to the Search Engine Journal to republish this article
in accordance to previously agreed rules and guidelines. You can link
to this article as much as you like, on the condition that full credit be
given to Serge Thibodeau, with a link back to the Rank for $ales website,