Google’s Search Relations team says most sites don’t need to worry about crawl budget in the latest episode of the Search Off the Record podcast.
Gary Illyes of Google discussed the topic in length, saying the team has been pushing back on their previous messaging, and adding that a ‘substantial segment’ does have to care about it.
However, crawl budget shouldn’t be a concern for a majority of sites, Illyes explains:
“We’ve been pushing back on the crawl budget, historically, typically telling people that you don’t have to care about it.
And I stand my ground and I still say that most people don’t have to care about it. We do think that there is a substantial segment of the ecosystem that has to care about it.
…but I still believe that – I’m trying to reinforce this here – that the vast majority of the people don’t have to care about it.”
In an effort to clarify previous messaging, Google has recently been publishing more information about crawl budget.
For example, just last month Google dedicated a whole episode of its SEO Mythbusting YouTube series to the topic of crawl budget.
So who should care about crawl budget and who should not?
When to Care About Crawl Budget / When Not To Care
SEOs typically want to hear a hard number when it comes to crawl budget – such as your site has to have X-number of pages before crawl budget is a concern.
But it doesn’t work like that, Illyes says:
“… well, it’s not quite like that. It’s like you can do stupid stuff on your site, and then Googlebot with start crawling like crazy.
Or you can do other kinds of stupid stuff, and then Googlebot will just stop crawling altogether.“
If forced to give a number, Illyes says roughly a million URLs is the baseline before a site owner really needs to care about crawl budget.
Sites with fewer than a million URLs do not have to care about crawl budget.
Factors Affecting Crawl Budget
For sites with over a million URLs, these are some of the factors that could lead to, or indicate, crawl budget issues.
Factor 1: Pages not crawled in a long time
“What would I look at? Probably URLs that were never crawled. That’s a good indicator for how well discovered, how well crawled a site is…
So I would look at pages that were never crawled. For this you probably want to look at your server logs because that can give you the absolute truth.”
Factor 2: Widespread changes after long periods of time
“Then I would also look at the refresh rates. Like if you see that certain parts of the site were not refreshed for a long period of time, say months, and you did make changes to pages in that section, then you probably want to start thinking about crawl budget.”
Fixing Crawl Budget Issues
Illyes offers two suggesting for fixing crawl budget issues.
First, try removing non-essential pages. Every page Googlebot has to crawl reduces the crawl budget for other pages.
So an excessive amount of “gibberish” content could lead to important content not getting crawled.
“Like if you remove, if you chop, if you prune from your site stuff that is perhaps less useful for users in general, then Googlebot will have time to focus on higher quality pages that are actually good for users.”
Illyes’ second suggestion is to avoid sending “back off” signals to Googlebot.
Back off signals are certain sever codes that will tell Googlebot to immediately stop crawling a site.
“If you send us back off signals, then that will influence Googlebot crawl. So if your servers can handle it, then you want to make sure that you don’t send us like 429, 50X status codes and that your server responds snappy, fast.”
To hear more about the intricacies of crawl budget, listen to the podcast episode down below.