1. SEJ
  2.  ⋅ 
  3. SEO

Google Reminds Websites To Use Robots.txt To Block Action URLs

Google's Gary Illyes recommends using robots.txt to block crawlers from "add to cart" URLs, preventing wasted server resources.

  • Use robots.txt to block crawlers from "action URLs."
  • This prevents wasted server resources from useless crawler hits.
  • It's an age-old best practice that remains relevant today.

In a LinkedIn post, Gary Illyes, an Analyst at Google, reiterated long-standing guidance for website owners: Use the robots.txt file to prevent web crawlers from accessing URLs that trigger actions like adding items to carts or wishlists.

Illyes highlighted the common complaint of unnecessary crawler traffic overloading servers, often stemming from search engine bots crawling URLs intended for user actions.

He wrote:

“Looking at what we’re crawling from the sites in the complaints, way too often it’s action URLs such as ‘add to cart’ and ‘add to wishlist.’ These are useless for crawlers, and you likely don’t want them crawled.”

To avoid this wasted server load, Illyes advised blocking access in the robots.txt file for URLs with parameters like “?add_to_cart” or “?add_to_wishlist.”

As an example, he suggests:

“If you have URLs like:

You should probably add a disallow rule for them in your robots.txt file.”

While using the HTTP POST method can also prevent the crawling of such URLs, Illyes noted crawlers can still make POST requests, so robots.txt remains advisable.

Related: 8 Common Robots.txt Issues And How To Fix Them

Reinforcing Decades-Old Best Practices

Alan Perkins, who engaged in the thread, pointed out that this guidance echoes web standards introduced in the 1990s for the same reasons.

Quoting from a 1993 document titled “A Standard for Robot Exclusion”:

“In 1993 and 1994 there have been occasions where robots have visited WWW servers where they weren’t welcome for various reasons…robots traversed parts of WWW servers that weren’t suitable, e.g. very deep virtual trees, duplicated information, temporary information, or cgi-scripts with side-effects (such as voting).”

The robots.txt standard, proposing rules to restrict well-behaved crawler access, emerged as a “consensus” solution among web stakeholders back in 1994.

Related: 6 Old School SEO Habits That Never Grow Old

Obedience & Exceptions

Illyes affirmed that Google’s crawlers fully obey robots.txt rules, with rare exceptions thoroughly documented for scenarios involving “user-triggered or contractual fetches.”

This adherence to the robots.txt protocol has been a pillar of Google’s web crawling policies.

Why SEJ Cares

While the advice may seem rudimentary, the re-emergence of this decades-old best practice underscores its relevance.

By leveraging the robots.txt standard, sites can help tame overzealous crawlers from hogging bandwidth with unproductive requests.

See also: How to Address Security Risks with Robots.txt Files

How This Can Help You

Whether you run a small blog or a major e-commerce platform, following Google’s advice to leverage robots.txt for blocking crawler access to action URLs can help in several ways:

  • Reduced Server Load: You can reduce needless server requests and bandwidth usage by preventing crawlers from hitting URLs that invoke actions like adding items to carts or wishlists.
  • Improved Crawler Efficiency: Giving more explicit rules in your robots.txt file about which URLs crawlers should avoid can lead to more efficient crawling of the pages/content you want to be indexed and ranked.
  • Better User Experience: With server resources focused on actual user actions rather than wasted crawler hits, end-users will likely experience faster load times and smoother functionality.
  • Stay Aligned with Standards: Implementing the guidance puts your site in compliance with the widely adopted robots.txt protocol standards, which have been industry best practices for decades.

Revisiting robots.txt directives could be a simple but impactful step for websites looking to exert more control over crawler activity.

Illyes’ messaging indicates that the ancient robots.txt rules remain relevant in our modern web environment.

Featured Image: BestForBest/Shutterstock

Category News SEO
SEJ STAFF Matt G. Southern Senior News Writer at Search Engine Journal

Matt G. Southern, Senior News Writer, has been with Search Engine Journal since 2013. With a bachelor’s degree in communications, ...

Google Reminds Websites To Use Robots.txt To Block Action URLs

Subscribe To Our Newsletter.

Conquer your day with daily search marketing news.