“Crawl budget” is a term that gets thrown around a lot by SEOs without a clear definition of what it means. In fact, Google itself doesn’t have a single term for defining what crawl budget stands for.
It’s loaded term made up of a lot moving parts. That’s why Gary Illyes, a Google Webmaster Trends Analyst, put together a fairly lengthy explainer on what crawl budget is and what it means for Googlebot.
The following is a summary of the key points from Illyes’ article.
Crawl Budget Explained
Crawl rate limit
When Googlebot crawls a site there’s a set number of simultaneous connections it can make, and set length of time it must wait between fetches. This is called “crawl rate limit”, and every site’s limit is unique.
Crawl rate limit is defined by two factors. The first is crawl health, meaning if the site responds quickly Googlebot can use more connections. If the site begins to slow down from too much crawling, then Googlebot will use fewer connections so it doesn’t degrade the user experience.
The second factor is Search Console — site owners can use Search Console to set a crawl rate limit manually within the Site Settings section.
Crawl rate limit is negligible if there’s no demand from indexing in the first place. Low demand equals low activity from Googlebot. Crawl demand is influenced by two seemingly opposite factors, popularity and staleness. Google wants to keep popular content fresh in its index, while also preventing older content from becoming stale.
Crawl demand can also be influenced by site-wide events like site moves, which triggers an increase in demand since Googlebot has to reindex the new URLs.
The combination of crawl rate and crawl demand creates a clearer definition of what crawl budget is, which Illyes explains is “the number of URLs Googlebot can and wants to crawl.”
Factors affecting crawl budget
In order for your site to maintain an optimal crawl budget, Illyes recommends not wasting resources on low-value-add URLs which can steal crawl activity away from your high-quality content.
Illyes defines low-value-add URLs as:
- Faceted navigation and session identifiers
- On-site duplicate content
- Soft error pages
- Hacked pages
- Infinite spaces and proxies
- Low quality and spam content
Other notes about crawl budget
- The faster the site, the higher the crawl rate.
- Monitor the Crawl Errors report in Search Console and keep server errors to a minimum.
- Crawling is not a ranking factor
- Alternate URLs, AMP URLs, embedded content, and long redirect chains negatively impact crawl budget.
- Pages marked as nofollow can still be crawled, and therefore do not affect crawl budget.
Subscribe to SEJ
Get our daily newsletter from SEJ's Founder Loren Baker about the latest news in the industry!