This is stated during the Google Search Central SEO hangout from February 25, where we also learned it’s impossible to stop Google from trying to crawl URLs that no longer exist.
Google may continue trying to crawl URLs years after they’ve been deleted from a website, and there’s nothing site owners can do to prevent that from happening.
Therefore 404s are unavoidable, even for the most diligent of SEOs.
An SEO named Robb Young asked the series of questions which drew that information out of Mueller this week.
Young has a site which is returning 404s in Search Console for URLs that haven’t been live for 8 years. The URLs were previously 410’d and have no links pointing to them.
He wants to know if this is normal or not. Here’s Mueller’s response.
John Mueller on Googlebot Crawling Old URLs
Mueller says 8 years is a long time to still be crawling nonexistent URLs, it’s not out of the realm of possibility.
If Google saw that a URL was live in the past then it may try to crawl the URL again from time to time.
If you know the URL doesn’t exist then you can simply ignore it in the Search Console report.
“Seven or eight years sounds like a really long time… if it was something that we saw in the past then we’ll try to recrawl it every now and then.
We’ll tell you: “oh this URL didn’t work.” And if you’re like: “well it’s not supposed to work.” Then that’s perfectly fine.”
In a follow-up question, Young asks if there’s any way he could send a stronger signal to Google that those URLs no longer exist.
Will Google ever stop trying to crawl the removed URLs?
“I don’t think you could guarantee that we won’t at least try [to crawl] those URLs. It’s one of those things where we have them in our system, and we know at some point they were kind of useful, so when we have time we’ll just re-try them.
It doesn’t cause any problems. It’s just, we re-try them and we show you a report and tell you, “oh we re-tried this and it didn’t work.””
Concerned about the volume of 404s in his Search Console report, Young asks one more follow-up question to Mueller.
He clarifies it’s not just a handful of URLs returning 404 errors, it’s around 30-40% of URLs in the report that have a 404 error.
Is that normal?
“That’s perfectly fine. That’s completely natural especially for a site that has a lot of churn. If it’s like a classifieds site where you have classified listings that are valid for a month, then you expect those listings to drop out. And then we, like, over the years, we collect a ton of those URLs and try them again. And if they return 404s or 410s, like, whatever. Perfectly fine.
I don’t think that would look unusual to us. It’s not like we would see that as a quality signal or anything. The only time where I think 404s would start to look like something problematic for us is when the home page starts returning 404s. Then that might be a situation where we go: “oh, I don’t know if this site is actually still up.”
But if parts of the site are 404, like, whatever. It’s like a technical thing, like, it doesn’t matter.”
Google can remember URLs long after they’ve been removed, and may try to re-crawl them at any time. However, there’s no need to stress when you see 404 errors in Search Console for URLs that aren’t supposed to be there anyway.
Hear Mueller’s full response in the video below: