Yahoo’s Index Claim Suspicious According To Study
Matthew Cheney, Mike Perry, and Dr. Orville Vernon Burton of the National Center for Supercomputing Applications have compared the size of Yahoo and Google’s Index. The researchers based their study off of an assumption that the search engine with more documents for obscure search terms would be the engine with a larger index.
Since the size of an area crawled can be measured by its perimeter, we feel that small, randomly selected search queries gives us the best chance to locate some of the most obscure web documents. By counting the presence of these obscure documents in either search engine, we can measure the comprehensiveness of each search engine to determine the relative size of each search engine’s index.
They concluded that a user could expect to find 166% more results in the Google index when compared to Yahoo’s.
Based on the data created from our sample searches, this study concludes that a user can expect, on average, to receive 166.9% more results using the Google search engine than the Yahoo! search engine. In fact, in the 10,012 test cases we ran, only in 3% of the cases (307) did Yahoo! return more results. In 96.6% of the cases (9676) Google returned more results. In less than 1% of the cases (29) both search engines returned the same number of results.
It is the opinion of this study that Yahoo!’s claim to have a web index of over twice as many documents as Google’s index is suspicious. Unless a large number of the documents Yahoo! has indexed are not yet available to its search engine, we find it puzzling that Yahoo!’s search engine consistently returned fewer results than Google.
A study like this can never really conclude how big Yahoo and Google’s indexes are since there a limitations to how many results the esearch engines display. Both search engines only return 1,000 results max - when there may be many more results that are not displayed. Even then, search engines can just claim that they have indexed more pages than they choose to use for their display index.
-
Michael Nguyen blogs regularly at Social Patterns and is a search engine optimization expert for SEO Inc, a full-service search engine optimization firm.









Comments
2 responses so far ↓
Pierre on Aug 17, 2005 at 2:15 am
Well, it’s not very scientific a study: what if Yahoo retrieves less pages but on much *more* keywords - to me “hidden pages” are more likely to talk about specific subjects. Also, it’s not because a search engine retrieves 100 pages on a keyword that it’s got 100 pages in its index - the result of a query is the product pages x relevancy.
i.e. here, they simply say that for a set number of kwds, GG returns more results. But the size of the “area” crawled has 2 dimensions: web pages x keywords
Anyway, all this does not tell us which search engine is the most relevant, and that’s what we’re really interested in…
(EMP) E-Marketing Performance » Blog Archive » Yahoo and Google “Duke It Out” on Jun 4, 2008 at 6:01 pm
[…] Remember not too long ago when Yahoo announced that it had twice as many documents indexed as Google? Well, there has been a recent study done by Matthew Cheney, Mike Perry, and Dr. Orville Vernon Burton of the National Center for Supercomputing Applications in which they have concluded that this claim is somewhat suspicious. […]
Leave a Comment