Very often we have to deal with site (partial) duplicate content issues created by the site CMS, like:
domain.com/category/
domain.com/category/1
Sorting:
domain.com/category/pricing-high-low/
domain.com/category/pricing-low-high/
Blocking the pages via Robots.txt Disallow directive will prevent bots from crawling the page but is it probably worth trying to let them spidering the page without indexing them? This may help a lot for discovering more inner pages.
The possible ways to do that (especially now that we don’t have to worry much about PR leakage):
- Adding “Noindex” robots meta tag to all pages except the first / base one;
- Adding in your robots.txt the Noindex directive (which is unofficially supported by Google)
User-agent: Googlebot
Noindex: /pricing-high-low/
- Using rel=canonical (which is the mildest of the three: it won’t prevent Google from crawling but at least it will show Google which pages are more powerful).
So have you ever tried preventing Google from indexing pages without blocking it from spidering them? Does it solve the duplicate content issue anyway?
I think that Meta NoIndex much better, because it’s supporting by many search bots.
I use this code:
meta NAME=”ROBOTS” CONTENT=”NOINDEX, FOLLOW”
If you have some external links to blocking pages this code don’t block link juice and blocked page can pass link juice to other pages.
Ann,
You always come up with great tech insights and articles – personally I’m doing a test right now on a recently launched site to see which method works best. I just started this week. I’m curious to see if the various methods have different effects.
Then again, by using different methods on the same site I wonder if that’s going to artificially have an effect on the results.
Thanks i read your article very nice article, please write how many types we increase site traffic, please.
thanks again
suresh
Good to know info. Thanks for the great stuff.
Thank you Ann, for the article.
I can see such doubts constantly raised in Google webmaster forums. People seldom comprehend the nuances in noindex and disallow, which is really a beauty. This would help immensely all the webmasters around the world.
Thank you.. Ann Smarty for such a important topic or article to share..
Hi Ann Smarty, I read you article but it is not sufficient about google page crawling without index. So i request you please write more about google crawling and robots.txt.