SEO

Getting Google Crawl Without Indexing?

Very often we have to deal with site (partial) duplicate content issues created by the site CMS, like:

Pagination:

domain.com/category/
domain.com/category/1

Sorting:

domain.com/category/pricing-high-low/
domain.com/category/pricing-low-high/

Blocking the pages via Robots.txt Disallow directive will prevent bots from crawling the page but is it probably worth trying to let them spidering the page without indexing them? This may help a lot for discovering more inner pages.

The possible ways to do that (especially now that we don’t have to worry much about PR leakage):

  • Adding “Noindex” robots meta tag to all pages except the first / base one;
  • Adding in your robots.txt the Noindex directive (which is unofficially supported by Google)

User-agent: Googlebot
Noindex: /pricing-high-low/

  • Using rel=canonical (which is the mildest of the three: it won’t prevent Google from crawling but at least it will show Google which pages are more powerful).

So have you ever tried preventing Google from indexing pages without blocking it from spidering them? Does it solve the duplicate content issue anyway?

f8d69258525dec38624a29eb3d570d8c 64 Getting Google Crawl Without Indexing?
Ann Smarty is the blogger and community manager at Internet Marketing Ninjas. Ann's expertise in blogging and tools serve as a base for her writing, tutorials and her guest blogging project, MyBlogGuest.com.
f8d69258525dec38624a29eb3d570d8c 64 Getting Google Crawl Without Indexing?

You Might Also Like

Comments are closed.

9 thoughts on “Getting Google Crawl Without Indexing?

  1. I use this code:

    meta NAME=”ROBOTS” CONTENT=”NOINDEX, FOLLOW”

    If you have some external links to blocking pages this code don’t block link juice and blocked page can pass link juice to other pages.

  2. Ann,

    You always come up with great tech insights and articles – personally I’m doing a test right now on a recently launched site to see which method works best. I just started this week. I’m curious to see if the various methods have different effects.

    Then again, by using different methods on the same site I wonder if that’s going to artificially have an effect on the results.

  3. I can see such doubts constantly raised in Google webmaster forums. People seldom comprehend the nuances in noindex and disallow, which is really a beauty. This would help immensely all the webmasters around the world.

  4. Hi Ann Smarty, I read you article but it is not sufficient about google page crawling without index. So i request you please write more about google crawling and robots.txt.