Should You Noindex Category & Archive Pages?

Category and archive pages can both become great assets or cause major issues when it comes to SEO. Learn how to approach these types of pages.

VIP CONTRIBUTOR Dan Taylor

July 22, 2020
⋅
6 min read

VIP CONTRIBUTOR Dan Taylor Agency Partner & Head of Innovation (Organic & AI) at Dan Taylor SEO

Bio

203

SHARES
43K

READS

Should You Noindex Category & Archive Pages?

Category and archive pages have the potential to both become a great asset and a major headache when it comes to organic search.

For example, on a travel blog, a category page could be a well-structured landing page for information around a specific topic.

On an ecommerce website a landing page for specific grouped products.

By the same logic, they can potentially cause conflicts on websites that aren’t category-focused, such as marketing agencies.

In this post, I’m going to detail how to identify whether you have an issue and if you should noindex your category and archive pages.

Defining Category & Archive Pages

Depending on your platform, development team, and even personal preference, category pages can take on many names, ranging from:

Category pages.
Collections (Shopify).
PLPs (Salesforce Commerce Cloud).

Custom definitions also exist.

For example, the Cloudflare blog (utilizing Ghost) uses the term “tags” which appear to form the same function as categories.

At the time of writing this post, the blog has 1,760 tags visible in a site: search.

It’s also worth highlighting that some websites, especially when content is the primary product, can have different types and hierarchies of categories.

For most ecommerce websites that have blogs, categories can exist and have different functions across different parts of the website.

This is important, as when explaining to developers that they need to take certain actions on category page templates, you need to be specific as to which ones.

For the purpose of this article, my definition of category pages is any page that contains, and links to, other pages on a website, whether they be products, sub-categories, or articles, based on a defined classification.

By similarity, archive pages are often associated with blog content and are auto-generated by some platforms, again based on a defined classification.

Identifying an Issue

Before taking any action, it’s important that you first ascertain if you do in fact have an issue relating to your category pages.

From experience, the majority of concerns around category style pages and their impact on SEO performance falls into one of two categories – ranking conflicts and crawl/index bloat issues.

Crawl Bloat & Index Bloat

Generally speaking, for the majority of websites crawl budget isn’t an issue and is oftentimes one of the more misunderstood aspects of SEO.

There is no 1:1 relationship that if your content is “indexable” that Google will invest resources in indexing it.

Google often crawls pages (with varying levels of frequency) and chooses not to index them based on a plethora of reasons ranging from:

Technical issues.
Not finding enough value in that specific HTML document of content to invest storage resources in it.
Etc.

Just because it isn’t indexed, doesn’t mean that Google isn’t crawling it (and internal links it finds).

If you have a large website with thousands of product SKUs, you may want to encourage Google to spend more time crawling the commercial content rather than non-commercial (a.k.a., blog category pages).

But then you also need to consider and weigh the value of search engines being able to discover supporting content through category crawl paths.

Are These Pages Causing Internal Cannibalization?

Category pages can become an issue (and an opportunity) if they are causing conflict and ranking for terms you’d prefer other pages to be ranking for.

You’ll be able to identify this by monitoring the URLs that search engines are returning for specific queries through tools like Google Search Console and general rank tracking.

Say for example you’re a lead generation website and your service is industrial window cleaning.

You would want your commercial-focused page with the big lead generation form to rank for a number of queries, including:

“industrial window cleaning”
“window cleaning for offices”
“window cleaning for businesses”

It’s fair to say that users performing those searches are highly likely to be looking for the service (and a quote), and not information on how it works or how to grow a window cleaning business.

So what do you do if your window cleaning blog category page is the one Google is choosing to return for these queries?

The immediate thought might be to prevent the category page from being ranked or indexed, but this is the wrong first thought to have.

I would first look at the commercial lead gen page you want to rank for these queries and compare it to the results Google is choosing to rank. Is your content on par (if not better) in terms of user value?

I’d then also rule out any other potential technical reasons, especially if Google isn’t ranking or indexing these pages at all.

Noindexing Your Category Pages

If you have identified that you have an issue with these pages and they aren’t providing vital internal crawl paths to older pieces of content, then noindexing these pages can make sense.

Since Google deprecated the ability to noindex via the robots.txt file back in September 2019, your options of noindexing now remain solely at the document level, these being:

Noindex via a page level meta robots tag.
Noindex via a HTTP response x-robots tag.

A less technical approach can also be to de-optimize your category pages by:

Removing unique content.
Reducing blog excerpt/snippet length.
Blocking them in the robots.txt file.

Google may still crawl them when you’re linking to them internally and presumably from multiple pieces of content.

But from experience, the search engine will crawl them less frequently and, more often than not, respect the robots.txt directive.

Noindex can, however, come with longer-term (potential) issues.

Google confirmed in January 2018 that if they see a persistent noindex, they will begin to treat the page as a soft 404.

This likely won’t cause any “real-world” implications, but for anyone who checks Google Search Console religiously, you will probably see some more errors in the Console that both don’t matter, and can’t be removed.

If your concern is index and/or crawl bloat and you have a similar setup like the Cloudflare example in this post, you may want to noindex some but maintain others.

You could have a rule where if a category has less than five posts, it inherits a noindex tag.

This way you can keep your more prominent categories indexed, and remove smaller ones from the index.