In a Google SEO office-hours hangout Google’s John Mueller was asked whether rel canonical or the noindex tag was the best approach for dealing with duplicate and thin content in an ecommerce site. John Mueller discussed both options and then suggested a third way to handle it.
The noindex meta tag is a directive, which means that Google must obey the meta tag and drop the web page from appearing in the search results.
All that the noindex tag does is to drop that page from showing up in Google’s search results.
Google’s official documentation states:
“You can prevent a page or other resource from appearing in Google Search by including a noindex meta tag or header in the HTTP response. When Googlebot next crawls that page and sees the tag or header, Googlebot will drop that page entirely from Google Search results, regardless of whether other sites link to it.”
A rel=canonical tag is a hint, not a directive. It gives Google a suggestion for which URL you want shown in the search results.
This is useful when there are multiple pages that are similar, especially when a shopping CMS generates multiple pages for the same product with usually the only difference being something trivial like the color of the item.
Google’s official rel canonical documentation explains the problem like this:
“A canonical URL is the URL of the page that Google thinks is most representative from a set of duplicate pages on your site. For example, if you have URLs for the same page (example.com?dress=1234 and example.com/dresses/1234), Google chooses one as canonical.”
The rel canonical is a useful solution because it can consolidate all of the link and relevance signals back to the main page that a publisher wants in the search results.
But because Google treats the rel canonical tag as a hint, there’s no guarantee that Google will obey it and the Google algorithm may decide to show some other page in the search result.
Rel Canonical Versus Noindex
The person asking the question wanted clarification about whether it was best to use noindex or canonicalization.
It’s not an unreasonable thing to be confused about because a case could be made using either solution.
Here’s the question:
“We have a website… an ecommerce store with a lot of product variations that have thin content or duplicate content even sometimes.
So …I made a list of all the URLs we want to keep or we want to have indexed… and then I made a list of all the URLs that we don’t want to have indexed.
The more I worked on it the more I asked this question to myself, canonicalization or noindexing?
I don’t know what the better of those would be.”
“…I think the general question of should I use noindex or rel canonical for another page is something where there probably isn’t an absolute answer.
So that’s kind of just offhand. It’s like if you’re struggling with that you’re not the only person who’s like, oh which one should I use?
That also usually means that both of these options can be okay.
So usually what I would look at there is what your really strong preference there is.
And if the strong preference is you really don’t want this content to be shown at all in search, then I would use noindex.
If your preference is, I really want everything combined in one page and if individual ones show up, like whatever, but most of them should be combined, then I would use a rel canonical.
And ultimately the effect is similar in that, well, it’s likely the page that you’re looking at won’t be shown in search.
But with a noindex it’s definitely not shown.
And with a rel canonical it’s more likely not shown.”
A Third Way to Deal with Duplicate and Thin Pages
Mueller next suggested that a publisher can use both noindex and rel canonical in order to benefit from both.
“…you can also do both of them.
And it’s something… if external links, for example, are pointing at this page then having both of them there kind of helps us to figure out well, you don’t want this page indexed but you also specified another one.
So maybe some of the signals we can just forward along.”
Combining Rel Canonical and Noindex is not a commonly discussed solution. But according to John Mueller it’s a valid way to deal with duplicate and thin content.
But ultimately it’s really up to the publisher to decide based on what their desired outcome is, whether consolidating link and relevance signals is important and whether making sure the page does not appear in search is paramount.
Google’s Official Documentation of Noindex
Google’s Official Documentation of Rel Canonical
Which is Best: NoIndex or Rel Canonical?
Watch at 16:49 Minute Mark