Google’s John Mueller answered a question in a Google Office Hours Hangout about a Search Console bug. URLs were listed as excluded but when the URL is examined the web page was listed as indexed.
Google’s John Mueller said that he’s seen reports of this anomaly and that he had an idea of what it might be.
Why Was Page Crawled But Not Indexed?
A person asked a question about an issue where Google reports that pages are not indexed but when examined another report says they are indexed.
This issue is making it difficult for the person to accurately track crawling and indexing statistics for the site.
The person asking the question explained the problem:
“We have like a very large number of Crawled Not Indexed Pages listed under Excluded.
But then when we click into them most of these seem to have been converted into indexed pages.
So we’re really unable to accurately track how improvements to our site are impacting which pages are being indexed.
And I was curious I guess about the timeline in that.
We’re concerned it’s impacting our crawling budget.”
Crawl Budget Impact
The person asking the question was concerned that the crawled but not index error was causing an issue with their crawl budget.
A crawl budget is the amount of URLs that Google allots to crawl on a site.
The crawl budget is calculated partly on the servers ability to serve pages. This is called the Crawl Capacity Limit.
If a server has a hard time serving pages then Google might limit how much it crawls in order to not impact the server’s ability to serve pages.
But if a server responds quickly and can easily handle GoogleBot’s request for more pages then Google may decide to raise the crawl budget and crawl more pages.
The crawl budget is also influenced by how often a site is updated.
A site that is rarely updated may be crawled less often than a site that is constantly updated.
What was going on, which the person revealed later on, was that the site has hundreds of thousands of pages.
But Google was only indexing around 2,000 per day, meaning that a great deal of pages were not being crawled at all.
The underlying concern that wasn’t yet raised was really about why aren’t those other pages getting indexed and if this crawled not index issue had something to do with the crawling problem.
But that question hadn’t yet been asked.
So John Mueller only answered, at this point, the question that was posed to him, which was about the crawled but not indexed problem and if that was impacting Google’s crawl budget.
John Mueller addressed the crawl budget issue:
“I doubt it would be affecting your crawling budget… as a side note.”
Google Crawled – Currently Not Indexed
Google’s Mueller next answered why Google might show that a page was crawled but not indexed but actually be indexed.
“It’s something where I’ve recently seen some threads like this on Twitter as well where people saw URLs that were flagged as not being indexed in Search Console.
And then when you check them individually they are actually indexed.
I don’t know exactly what is happening there yet.
My suspicion is it’s more a matter of timing in that we show them in the Search Console report and then they get indexed over time.
…Then at some point they would drop out of the report again.
And for whatever reason kind of that dropping out is taking a little bit longer than it should.
That’s kind of my guess there.”
Verify Index Coverage Issue
Mueller next suggested a way to verify whether or not what was being reported in Google Search Console was a real index coverage problem or if it is just a lag in reporting.
John Mueller suggested:
“One way to kind of verify that is to see if these pages actually show up for normal searches.
So take some words from the page, search for that.
And if they do show up then I think there’s nothing you really need to do.
It’s just a report that’s kind of lagging behind.”
Lag in Index Coverage Reporting
There appears to be a lag in the indexing report. One hopes that the lag is something Google may take a look at in the near future as it presents a poor user experience to provide false information .
Read Google’s developers page explainer about GoogleBot crawl budget:
Watch John Mueller answer the question about Google Search Console indexing report lagging behind.
View it at the 22:43 minute mark: