Yahoo’s Index Claim Suspicious According To Study
Matthew Cheney, Mike Perry, and Dr. Orville Vernon Burton of the National Center for Supercomputing Applications have compared the size of Yahoo and Google’s Index. The researchers based their study off of an assumption that the search engine with more documents for obscure search terms would be the engine with a larger index.
Since the size of an area crawled can be measured by its perimeter, we feel that small, randomly selected search queries gives us the best chance to locate some of the most obscure web documents. By counting the presence of these obscure documents in either search engine, we can measure the comprehensiveness of each search engine to determine the relative size of each search engine’s index.
They concluded that a user could expect to find 166% more results in the Google index when compared to Yahoo’s.
Based on the data created from our sample searches, this study concludes that a user can expect, on average, to receive 166.9% more results using the Google search engine than the Yahoo! search engine. In fact, in the 10,012 test cases we ran, only in 3% of the cases (307) did Yahoo! return more results. In 96.6% of the cases (9676) Google returned more results. In less than 1% of the cases (29) both search engines returned the same number of results.
It is the opinion of this study that Yahoo!’s claim to have a web index of over twice as many documents as Google’s index is suspicious. Unless a large number of the documents Yahoo! has indexed are not yet available to its search engine, we find it puzzling that Yahoo!’s search engine consistently returned fewer results than Google.
A study like this can never really conclude how big Yahoo and Google’s indexes are since there a limitations to how many results the esearch engines display. Both search engines only return 1,000 results max – when there may be many more results that are not displayed. Even then, search engines can just claim that they have indexed more pages than they choose to use for their display index.