Google’s John Mueller offers more detail about new data in Search Console’s updated Crawl Stats report – the ‘discovery’ and ‘refresh metrics.
The Crawl Stats report in Google Search Console was updated several weeks ago and offers data that wasn’t being reported on previously.
A specific section of data, Crawl Purpose, came up in the November 27 edition of the Google Search Central live stream.
Mueller was asked to provide more context on the two metrics included within Crawl Purpose – percentage of ‘discovered’ URLs and percentage of ‘refreshed’ URLs.
Specifically, the following question was submitted:
“What’s the difference between discovery and refresh? In our case it’s showing 84% refresh.
Does that mean 84% of the time Google is crawling known URLs from their database, and only 16% of the time they crawl our site, sitemaps, and links from other URLs from the known URL database?”
Google’s official Search Console help document offers brief descriptions of discovery and refresh:
- Discovery: The URL requested was never crawled by Google before.
- Refresh: A recrawl of a known page.
Mueller expands on that information in his response to the above question.
Mueller on ‘Crawl Purpose’ Data
Mueller prefaces his answer with disclose that he’s not 100% sure which URLs will be grouped into discovery and refresh, but he provides his own understanding of it.
Refreshed URLs refer to previously-crawled pages that were crawled again for the purpose of updating the information in Google’s search index.
Discovered URLs refer to pages on a site that were crawled for the first time and never seen by Google before.
Here’s how Mueller puts it:
“I’m not 100% sure what exactly we would put into each of those buckets, but generally we do split things up into refresh crawling where we try to update the information that we have on a site, and discovery crawling where we try to find new URLs that we’ve heard about from the website. Which could be things like from new internal links or from external links pointing to your website.”
Mueller adds that a refresh crawl involves updating content while actively looking for newly-placed links.
“Refresh crawl doesn’t mean that we’re just updating the page’s content, we’re also looking for new links which we can then use for discovering new content.”
When reading the Crawl Stats report site owners should see a higher percentage of refreshed URLs compared to discovered URLs.
Exceptions that come to mind are the launching of a new site, migrating one site with another, uploading a new sitemap, and other such actions.
If the report shows that rapidly changing pages are not being crawled often enough, ensure they are included in a sitemap.
Pages that update less frequently will be crawled less often, though site owners can force a recrawl by manually pinging Google.
For the full question and answer from the Search Central stream refer to the video below. Full details about Google’s updated Crawl Stats report can be found here: Google Updates Search Console Crawl Stats Report.