Advertisement
  1. SEJ
  2.  ⋅ 
  3. News

Google Says Robots.txt Blocking Certain External Resources is Okay

Screenshot from a Google JavaScript SEO Office Hours with Google's Martin Splitt

In a recent JavaScript SEO Office Hours, Google’s Martin Splitt answered a question about blocking external JS and CSS resources. The question was whether blocking the resources would cause a site to lose rankings.

There was however, a wrinkle in the question that was asked…

Blocked JavaScript and CSS Can Affect Rankings

Blocking JavaScript and CSS files can be cause ranking issues in certain situations. One reason is because Google needs some of those files in order to render the the web page and determine if it is mobile friendly.

An official Google’s developer page says this

“For optimal rendering and indexing, always allow Googlebot access to the JavaScript, CSS, and image files used by your website so that Googlebot can see your site like an average user.

If your site’s robots.txt file disallows crawling of these assets, it directly harms how well our algorithms render and index your content. This can result in suboptimal rankings.”

Blocking External JavaScript and CSS

The person asking the question has a valid reason to be concerned about how Google might react about blocking external resources.

The question:

“If you use robots.txt to block JS or CSS on external JS files/CSS files in other domain or if other domain blocks them, so the user will see different things than Googlebot, right?

Would Google distrust this kind of page and downrank them?”

Google’s Martin Splitt answered confidently:

“No, we won’t downrank anything. It’s not cloaking. Cloaking very specifically means misleading the user.

Just because we can’t see content doesn’t necessarily mean that you’re misleading the user.”

Cloaking is a trick that spammers use to show one set of content to Google in order to trick Google into ranking it and show a completely different web page to users, like a virus or spam laden web page.

Cloaking is also a way to keep Google from crawling URLs publishers don’t want Google to see, like affiliate links.

Martin’s answer is coming from the direction of whether blocking external resources will be seen as cloaking and his answer is no.

How Blocking External Resources Can Be Problematic

Martin then goes on to describe how blocking external resources can become an issue:

“It is still potentially problematic if your content only shows up when we can fetch these resources and we don’t see the content in the rendered HTML because it’s blocked by robots.txt.

Then we can’t index it. If there’s content missing, we can’t index that.”

Google’s Testing Tools Will Reveal Problems

Martin then goes on to show how a publisher can diagnose whether blocking resources is problematic.

“So it’s definitely worth trying out our testing tools to see if the content that you want use to see on the page is actually visible on the page even though some JavaScript or CSS resources might be robotted.

But generally speaking, robotting JavaScript or CSS resources isn’t per se a problem. It can be a problem if we can’t see the content but it is fine from the standpoint of cloaking, it’s not cloaking.”

He further clarified:

“If the content is loaded by JavaScript and we can’t load that JavaScript because it’s robotted, we’re not going to see it and that’s potentially problematic. But if it’s an enhancement like a chat box or a comment widget… then that isn’t an issue.”

The Publisher Asked a Trick Question

That’s an interesting answer that it’s okay to block external resources associated with a chat box or a comment widget. It may be useful to block those resources for example if it helps speed up the site rendering for Google, but…

But there’s a slight wrinkle to the question that was asked:  You can’t block external resources (on another domain) using robots.txt.

The original question was a two-parter.

This is the problematic first part:

“If you use robots.txt to block JS or CSS on external JS files/CSS files in other domain…”

That part of the question is impossible to accomplish with Robots.txt.

Google’s developers page mentions this topic about a robots.txt:

“It is valid for all files in all subdirectories on the same host, protocol and port number.”

What was overlooked about that question is that a robots.txt only uses relative URLs, not absolute URLs (except for the location of a site map).

A relative URL means that the URL is “relative” to the page with the link.

On an HTACCESS file all the URLs look like this:

/file-1/example

And this is what an absolute URL looks like:

https://www.example.com

So, if you can’t use an absolute URL in the robots.txt then you can’t block an external resource with a robots.txt.

The second part of the question is technically correct:

“…or if other domain blocks them, so the user will see different things than Googlebot, right? Would Google distrust this kind of page and down rank them?”

External resources are often blocked by the other sites. So the question and answer makes more sense from that direction.

Martin Splitt said that blocking those external resources is not cloaking. That statement is true if you don’t use Robots.txt.

That’s what probably what Martin was referring to but…

But the question was specifically about robots.txt.

In the real world, if one wishes to block external resources with a robots.txt, then many turn to cloaking.

Cloaking has a bad rap and for good reason. But the truth is that not all cloaking is bad. For example, Yoast has a tutorial about cloaking affiliate links, for example.

Some forms of cloaking can be a way to block resources that have nothing to do with how the page renders and that fits into what Google recommends.

Watch the Google JavaScript SEO Office Hours here:

 

Category News Web Dev SEO
ADVERTISEMENT
SEJ STAFF Roger Montti Owner - Martinibuster.com at Martinibuster.com

I have 25 years hands-on experience in SEO and have kept on  top of the evolution of search every step ...

Google Says Robots.txt Blocking Certain External Resources is Okay

Subscribe To Our Newsletter.

Conquer your day with daily search marketing news.