Google’s Gary Illyes updated his original writeup on crawl budget with clarification about disallowed URLs.
The document now includes the following information:
“Q: Do URLs I disallowed through robots.txt affect my crawl budget in any way?
A: No, disallowed URLs do not affect the crawl budget.”
The question refers to the “User-agent: * Disallow: /” protocol in robots.txt that blocks web crawlers.
It can either be used to block an entire site from being crawled, or it can be used to block specific URLs from being crawled.
According to the update from Illyes, blocking specific URLs will not impact the crawl budget throughout the rest of the site.
Pages will not get crawled more frequently as a result of other pages on the site being disallowed from crawling.
There’s also no disadvantage to disallowing URLs when it comes to crawl budget.
The updated information appears at the bottom of this article, which is a Webmaster Central blog post from 2017.
Illyes said on Twitter that there are plans to turn the blog post into an official help center article.
- Best Practices for Setting Up Meta Robots Tags and Robots.txt
- Googlebot Crawl Budget Explained by Google’s Gary Illyes
- 9 Tips to Optimize Crawl Budget for SEO
- Google Can Index Blocked URLs Without Crawling