I have already looked into ways to increase crawl rate for those who suspect their site is worth more frequent bot visits. However from time to time I come across forum threads where webmasters complain that (Google) bot seems too active at their pages eating too much bandwitch.
So here I am listing links to those complaints and possible solutions that might help to handle the situation:
- Verify the Googlebot - make sure that’s really Google that is too interested in your site.
- Change the Googlebot’s crawl rate via Google Webmaster Tools (while “higher rate” option is available only to few lucky people, “slower rate” can be set by anyone);
- Make sure the Googlebot is not crawling any extra URLs - i.e. duplicate URLs, URLs containing session IDs or other (multiple) parameters not necessary for indexing:
- Additive filtering of a set of items;
- Sorting parameters; etc.
- Make sure your pages don’t have too many unique URLs that keep the bot too busy: the most popular and widely-known standard is no more than 100 links per page (however that’s not a must of course, Google can cope with more - just keep the number reasonable);
- Make sure the Googlebot is not busy crawling huge images or PDFs - i.e. mind the size of your files and your page overall load time.











Comments
2 responses so far ↓
The Financial Blog on Aug 15, 2008 at 10:46 am
Ann, thanks a million. Great stuff
John S. Britsios on Aug 16, 2008 at 7:45 pm
Great points Ann.
I would like to add here, that you should also make sure that you implement If-Modified-Since (304) on your web site.
Leave a Comment