Arnold Zafra

Webmasters’ Robot.txt files Prefer Google Over Other Search Engines

November 16th, 2007 by Arnold Zafra | 5 Comments

A study conducted by some Penn State University researchers revealed that webmasters who use robot.txt files to determine which part of their sites are open or close to web crawlers,  favors Google’s spiders and bots over other search engines. Using a new search engine which the researchers themselves have created specifically for the study, it was found out that some webmasters may have been writing  robot.txt files which does not uniformly blocks or accepts search engines but allows Google and some Yahoo and MSN to crawl almost all of their pages. 

The study entitled “Determining Bias to Search Engines from Robot.txt” was given during the recent 2007 IEEE/WIC/ACM International Conference on Web Intelligence in Silicon Valley. Authors of the paper were named as C. Lee Giles, Yang Sun and Ziming Zhuang  who are all students of Penn State’s IST Department. 

Does this study explain why Google gets to give more results as compared to the other search engines? More so, does this study explain why Google is currently the top search engine today? I don’t think so. 

What interests me though, and which was not explained by the study was why those webmasters would consciously write robot.txt files that favors Google’s web crawlers to get into their pages? To gain higher page ranks for their sites? To drive more traffic?




Comments

5 responses so far ↓

  • Referatele on Nov 16, 2007 at 2:00 am

    I’m wonder if i’m using on my robots.txt the next code:

    User-agent: *
    Disallow:

    I’m still favor Google’s spiders because i’m uniformly accept all search engines.

  • Alphane Moon on Nov 16, 2007 at 6:59 am

    There is a myth that is sometimes published in computer magazines that states that using a robots.txt and explicitly allowing Googlebot access to the site will have a positive effect on Google rankings.

  • Ken on Nov 16, 2007 at 9:39 am

    If I find an unknown bot hitting my site hard I will frequently block that specific bot. I even have automated routines to detect what I call “bad bots”. If the researchers hit my site with their bot and it set off one of my bad bot “sensors” they could have easily found themselves banned from my site. This isn’t playing favoritism to Google, this is about conserving resources that unknown bots waste.

  • MSN hacken on Jan 17, 2008 at 2:19 pm

    I want to do.

Leave a Comment