A study conducted by some Penn State University researchers revealed that webmasters who use robot.txt files to determine which part of their sites are open or close to web crawlers, favors Google’s spiders and bots over other search engines. Using a new search engine which the researchers themselves have created specifically for the study, it was found out that some webmasters may have been writing robot.txt files which does not uniformly blocks or accepts search engines but allows Google and some Yahoo and MSN to crawl almost all of their pages.
The study entitled “Determining Bias to Search Engines from Robot.txt” was given during the recent 2007 IEEE/WIC/ACM International Conference on Web Intelligence in Silicon Valley. Authors of the paper were named as C. Lee Giles, Yang Sun and Ziming Zhuang who are all students of Penn State’s IST Department.
Does this study explain why Google gets to give more results as compared to the other search engines? More so, does this study explain why Google is currently the top search engine today? I don’t think so.
What interests me though, and which was not explained by the study was why those webmasters would consciously write robot.txt files that favors Google’s web crawlers to get into their pages? To gain higher page ranks for their sites? To drive more traffic?





I’m wonder if i’m using on my robots.txt the next code:
User-agent: *
Disallow:
I’m still favor Google’s spiders because i’m uniformly accept all search engines.
There is a myth that is sometimes published in computer magazines that states that using a robots.txt and explicitly allowing Googlebot access to the site will have a positive effect on Google rankings.
If I find an unknown bot hitting my site hard I will frequently block that specific bot. I even have automated routines to detect what I call “bad bots”. If the researchers hit my site with their bot and it set off one of my bad bot “sensors” they could have easily found themselves banned from my site. This isn’t playing favoritism to Google, this is about conserving resources that unknown bots waste.
I want to do.