Editor’s note: “Ask an SEO” is a weekly column by technical SEO expert Jenny Halasz. Come up with your hardest SEO question and fill out our form. You might see your answer in the next #AskanSEO post!
Today’s Ask an SEO question is from Ashok S. He asks:
I am running a WordPress website. To avoid the risk of duplicate content, should I use a noindex tag on Categories and Archives Pages? Will this impact my overall traffic?
This is a great question, thank you. It’s important for every SEO professional to fundamentally understand how Google works.
So to begin with, the answer is probably not. Most websites don’t need to be concerned about Google crawling some pages that they find no value in.
Pages like tag pages, category pages, and search results that are included “out of the box” in popular CMS like Drupal and WordPress are generally not prevalent enough to matter. If Google sees value in them, they will crawl them and index them. If they don’t, they won’t.
If you have a large ecommerce site, with hundreds of thousands of products, this may become a bigger issue, because you want to focus Google’s crawlers on the pages that matter, and remove things that don’t have any value.
To fully answer this question, you also must understand the difference between robots.txt blocking and a meta noindex tag, as well as how 404s and soft 404s work.
If you place a command in robots.txt to block Google (and other crawlers) from accessing pages, you actually prevent them from getting to those pages.
If Google comes across a page that is blocked in robots.txt, they will not execute a “Fetch” or GET command to access the header of the page. This means that if later you decide you want that page noindexed, or want to serve another status (like a redirect or a 404), Google will not be able to see that change.
Robots.txt commands should be limited to pages you know Google is not going to see another way (i.e., people will not link to them, you will not link to them within your site, and they are probably password protected).
Meta Robots NoIndex Tag
The meta robots=”noindex” tag is different than robots.txt, and many SEO professionals treat it the same. The biggest differences with a noindex tag are:
- While it is also a robots directive, it is less restricted than robots.txt. Google and other search engines can GET the page, headers and all.
- It does exactly what it sounds like. It directs Google not to index – that is, not to add the page as eligible for search results. Google will still collect all of the data on the page, and follow all of the links unless you also use nofollow. Nofollow is not an official directive, but google and other search engines respect it.
- If you use a noindex tag and later decide to serve a server side redirect or 404 instead, Google will be able to access that status change and update their data accordingly.
404s & Soft 404s
404 error status pages indicate that the page is not found, and are a web standard that all crawlers respect. If Google encounters a 404 error page, they will drop it out of the index, but keep it in their crawl scheduler to double-check periodically… just to make sure it hasn’t changed.
A soft 404 error is an unofficial designation that Google places on pages that may resolve with a 200 (Found) status, but which do not provide any content. Internal search results pages that have zero results are one example of this. If Google designates a page as a soft 404 error, they treat it the same way as the 404 error. As with the 404 error, they will check it periodically to make sure it doesn’t change.
Should You Use Noindex on Category Pages?
Which brings us back to our question – is noindex the right strategy for category pages that add little or no value to your website?
The answer is that if you feel the pages add no value, you should probably delete them entirely and serve a 404 error status. If the pages are important for users to navigate and are a “necessary evil” of having a blog, then they should be noindexed.
If you noindex the pages, Google has stated that they will eventually treat those pages as soft 404s. This means that no links that point to these pages will count for ranking determinations.
Via @johnmu (1) A continuation of noindex, follow conversation. John checked with the team & noindexed pages will ultimately be treated like soft 404s. *All links will dropped* if they see a persistent noindex: https://t.co/XKMwfatitT pic.twitter.com/MXgEWAJh3a
— Glenn Gabe (@glenngabe) January 17, 2018
Why does this matter? Ultimately it probably doesn’t.
If links are pointing to pages that you don’t think have any value, then the search engines and users probably don’t find any value in them either.
What Not to Do
Do not canonical all category and tag pages to the blog root page. This is an improper use for the canonical. Google will ignore it.
Do not put these pages in robots.txt. If you block them, then Google won’t be able to see when you update or change them, but they will remain in search results with this ugly listing:
Make sure you and your dev team know the difference between robots.txt and meta robots noindex commands. Use them appropriately and you will be one step ahead of the game.
If you have pages that don’t provide any value to searchers as a landing page, but they are necessary for navigation, either rethink your navigation strategy (perhaps a more informative category page with some unique content would be appropriate?) or noindex the pages.
If you only have a handful of these pages, or don’t think they’re a big deal on your site, just leave them as they are. Google is smart enough to figure it out.
Have a question about SEO for Jenny? Fill out this form or use #AskAnSEO on social media.
Featured Image: Image by Paulo Bobita
Screenshot taken by author, May 2018