A study conducted by some Penn State University researchers revealed that webmasters who use robot.txt files to determine which part of their sites are open or close to web crawlers, favors Google’s spiders and bots over other search engines. Using a new search engine which the researchers themselves have created specifically for the study, it was found out that some webmasters may have been writing robot.txt files which does not uniformly blocks or accepts search engines but allows Google and some Yahoo and MSN to crawl almost all of their pages.
The study entitled “Determining Bias to Search Engines from Robot.txt” was given during the recent 2007 IEEE/WIC/ACM International Conference on Web Intelligence in Silicon Valley. Authors of the paper were named as C. Lee Giles, Yang Sun and Ziming Zhuang who are all students of Penn State’s IST Department.
Does this study explain why Google gets to give more results as compared to the other search engines? More so, does this study explain why Google is currently the top search engine today? I don’t think so.
What interests me though, and which was not explained by the study was why those webmasters would consciously write robot.txt files that favors Google’s web crawlers to get into their pages? To gain higher page ranks for their sites? To drive more traffic?