Craigslist Blocks Search Engine Spiders and Listings
According to a thread at SERoundtable Forums, Craigslist has blocked the spidering and indexing of its classifieds sites from search engine robots, which scan web sites and save the site information in the memories of search engines. That site info is later delivered to the end user when searching on Google, Yahoo, Ask, MSN or other engines.
As a result of the blocking of search engine spiders or bots, Craigslist pages are now not showing in search engine results. A member of the SER forum writes “So the answer is clear. Craig blocked the bulk of his content from being crawled. A query in Google or Yahoo for an item in Craig’s “jobs” or “for sale” section will confirm that his content has been removed entirely. To my knowledge, this is the largest deindexing ever. Tens of million pages vanished.”
So basically, pages within Craigslist are not listed in search engine results. Why would they want to do such a bulk delisting when webmasters all over the globe are scrounging to get such valuable search traffic? It could be an effort to deter spammers from listing bogus links in Craigslist in an effort to have those links followed by search engines. The delisting may also be an attempt to stop scraping of the sites content via bots which access cache pages of Craigslist from search engines, then reprint the content illegally on other sites.
Remember back last year when Craigslist blocked aggregation sites and niche engines like Oodle from indexing their classified ads? This recent block of the major search engine bots may be an extended effort to establish Craigslist as more of a destination, and not content fuel for Yahoo, Google (& Google Base), Ask, & MSN local efforts.
Subscribe to SEJ
Get our daily newsletter from SEJ's Founder Loren Baker about the latest news in the industry!