Google’s John Mueller answered a question about indexing, offering insights into how overall site quality influences indexing patterns. He also offered the insight that it’s within the bounds of normal that 20% of a site’s content is not indexed.
Pages Discovered But Not Crawled
The person asking the question offered background information about their site.
Of particular concern was the stated fact that the server was overloaded and if that might affect how many pages Google indexes.
When a server is overloaded the request for a web page may result in a 500 error response. This is because when a server cannot serve a web page the standard response is a 500 Internal Server Error message.
The person asking the question did not mention that Google Search Console was reporting that Googlebot was receiving 500 error response codes.
So if it’s the case that Googlebot did not receive a 500 error response then the server overload issue is probably not the reason why 20% of the pages are not getting indexed.
The person asked the following question:
“20% of my pages are not getting indexed.
It says they’re discovered but not crawled.
Does this have anything to do with the fact that it’s not crawled because of potential overload of my server?
Or does it have to do with the quality of the page?”
Crawl Budget Not Generally Why Small Sites Have Non-indexed Pages
Google’s John Mueller offered an interesting explanation of how overall site quality is an important factor that determines whether Googlebot will index more web pages.
But first he discussed how the crawl budget isn’t usually a reason why pages remain non-indexed for a small site.
John Mueller answered:
“Probably a little of both.
So usually if we’re talking about a smaller site then it’s mostly not a case that we’re limited by the crawling capacity, which is the crawl budget side of things.
If we’re talking about a site that has millions of pages, then that’s something where I would consider looking at the crawl budget side of things.
But smaller sites probably less so.”
Overall Site Quality Determines Indexing
John next went into detail about how overall site quality can affect how much of a website is crawled and indexed.
This part is especially interesting because it gives a peek at how Google evaluates a site in terms of quality and how the overall impression influences indexing.
Mueller continued his answer:
“With regards to the quality, when it comes to understanding the quality of the website, that is something that we take into account quite strongly with regards to crawling and indexing of the rest of the website.
But that’s not something that’s necessarily related to the individual URL.
So if you have five pages that are not indexed at the moment, it’s not that those five pages are the ones we would consider low quality.
It’s more that …overall, we consider this website maybe to be a little bit lower quality. And therefore we won’t go off and index everything on this site.
Because if we don’t have that page indexed, then we’re not really going to know if that’s high quality or low quality.
So that’s the direction I would head there …if you have a smaller site and you’re seeing a significant part of your pages are not being indexed, then I would take a step back and try to reconsider the overall quality of the website and not focus so much on technical issues for those pages.”
Technical Factors and Indexing
Mueller next mentions technical factors and how easy it is for modern sites to get that part right so that it doesn’t get in the way of indexing.
“Because I think, for the most part, sites nowadays are technically reasonable.
If you’re using a common CMS then it’s really hard to do something really wrong.
And it’s often more a matter of the overall quality.”
It’s Normal for 20% of a Site to Not Be Indexed
This next part is also interesting in that Mueller downplays 20% of a site not indexed as something that is within the bounds of normal.
Mueller has more access to information about how much of sites are typically not indexed so I take him at his word because he speaking from the perspective of Google.
Mueller explains why it’s normal for pages to not be indexed:
“The other thing to keep in mind with regards to indexing, is it’s completely normal that we don’t index everything off of the website.
So if you look at any larger website or any even midsize or smaller website, you’ll see fluctuations in indexing.
It’ll go up and down and it’s never going to be the case that we index 100% of everything that’s on a website.
So if you have a hundred pages and (I don’t know) 80 of them are being indexed, then I wouldn’t see that as being a problem that you need to fix.
That’s sometimes just how it is for the moment.
And over time, when you get to like 200 pages on your website and we index 180 of them, then that percentage gets a little bit smaller.
But it’s always going to be the case that we don’t index 100% of everything that we know about.”
Don’t Panic if Pages Aren’t Indexed
There’s quite a lot of information Mueller shared about indexing to take in.
- It’s within the bounds of normal for 20% of a site to not be indexed.
- Technical issues probably won’t impeded indexing.
- Overall site quality can determine how much of a site gets indexed.
- How much of a site gets indexed fluctuates.
- Small sites generally don’t have to worry about crawl budget.
It’s Normal for 20% of a Site to be Non-indexed
Watch Mueller discussing what is normal indexing from about the 27:26 minute mark.