Google updated their Search Central Documentation to on verifying Googlebot, adding documentation about user-triggered bot visits, information that was missing from previous Googlebot documentation, which has created confusion for many years, with some publishers blocking the IP ranges of the legitimate visits.
Newly Updated Bot Documentation
Google added a new documentation that categorizes the three different kinds of bots that publishers should expect.
These are the three categories of Google Bots:
- Googlebot – Search crawler
- Special-case crawlers
- User-triggered fetchers (GoogleUserContent)
That last one, GoogleUserContent is one that’s confused publishers for a long time because Google didn’t have any explicit documentation about it.
This is what Google says about GoogleUserContent:
Tools and product functions where the end user triggers a fetch.
For example, Google Site Verifier acts on the request of a user.
Because the fetch was requested by a user, these fetchers ignore robots.txt rules.”
The documentation states that the reverse DNS mask will show the following domain:
Google recently updated their Google Crawlers page to create a section specifically about user-triggered fetchers.
The list of the different crawlers contains the same bots but the page has been reorganized to categorize user-triggered fetchers in their own group.
The following crawlers are now designated as user-triggered fetchers:
Feedfetcher is used for crawling RSS or Atom feeds for Google Podcasts, Google News, and PubSubHubbub.
Google Publisher Center
Fetches and processes feeds that publishers explicitly supplied through the Google Publisher Center to be used in Google News landing pages.
Google Read Aloud
Upon user request, Google Read Aloud fetches and reads out web pages using text-to-speech (TTS).
Google Site Verifier
Google Site Verifier fetches upon user request Search Console verification tokens.”
In the past, what I was told by some in the SEO community, is that bot activity from IP addresses associated with GoogleUserContent.com was triggered when a user viewed a website through a translate function that used to be in the search results, a feature that no longer exists in Google’s SERPs.
I don’t know if that was true or not in the past.
But the above is the new information that we have now about user-triggered fetchers.
Additionally, Google added the following information about user-triggered fetchers:
User-triggered fetchers are triggered by users to perform a product specific function. For example, Google Site Verifier acts on a user’s request.
Because the fetch was requested by a user, these fetchers generally ignore robots.txt rules. The IP ranges the user-triggered fetchers use are published in the user-triggered-fetchers.json object.”
Google’s new documentation explains that bot activity from IP addresses associated with GoogleUserContent.com can be triggered by the Google Site Verifier tool.
The other change in the documentation is a reference to googleusercontent.com in the context of IP addresses that are assigned to the domain name, GoogleUserContent.com.
Lastly, Google retired their Mobile Apps Android crawler.
The user agent token and full string were both: AdsBot-Google-Mobile-Apps
This was the purpose of the now retired crawler:
“Checks Android app page ad quality. Obeys AdsBot-Google robots rules, but ignores the global user agent (*) in robots.txt.”
This is the new text:
“Verify that the domain name is either googlebot.com, google.com, or googleusercontent.com.”
Another new addition is the following text which was expanded from the old page:
“Alternatively, you can identify Googlebot by IP address by matching the crawler’s IP address to the lists of Google crawlers’ and fetchers’ IP ranges:
Special crawlers like AdsBot
User triggered fetches”
Google Bot Identification Documentation
The new documentation finally has something about bots that use IP addresses that are associated with GoogleUserContent.
Search Marketers were confused by those IP addresses and assumed that those bots were spam.
A Google Search Console Help discussion from 2020 shows how confused people were about activity associated with GoogleUserContent.
Many in that discussion rightly concluded that it was not Googlebot but then mistakenly concluded that it was a fake bot pretending to be Google.
A user posted:
“The behaviour I see coming from these addresses is very close (if not identical) to legitimate Googlebot behaviour, and it hits multiple sites of ours.
…If it isn’t – then this seems to indicate there is widespread malicious bot activity by someone trying quite hard to look like Google on our sites which is concerning.”
After several responses the person who started the discussion concludes that the GoogleUserContent activity was spam.
“…The Googlebots in question do mimic the official User-Agents, but as it stands the evidence seems to point to them being fake.
I’ll block them for now.”
Now we know that bot activity from IPs associated with GoogleUserContent are not spam or hacker bots.
They really are from Google. Publishers who are currently blocking IP addresses associated with GoogleUserContent should probably unblock them.
The current list of User Triggered Fetcher IP addresses is available here.
Read Google’s updated documentation:
Verifying Googlebot and other Google crawlers
Featured image by Shutterstock/Asier Romero