Getting Google Crawl Without Indexing?

SMS Text

Very often we have to deal with site (partial) duplicate content issues created by the site CMS, like:

Pagination:

domain.com/category/
domain.com/category/1

Sorting:

domain.com/category/pricing-high-low/
domain.com/category/pricing-low-high/

Blocking the pages via Robots.txt Disallow directive will prevent bots from crawling the page but is it probably worth trying to let them spidering the page without indexing them? This may help a lot for discovering more inner pages.

The possible ways to do that (especially now that we don’t have to worry much about PR leakage):

  • Adding “Noindex” robots meta tag to all pages except the first / base one;
  • Adding in your robots.txt the Noindex directive (which is unofficially supported by Google)

User-agent: Googlebot
Noindex: /pricing-high-low/

  • Using rel=canonical (which is the mildest of the three: it won’t prevent Google from crawling but at least it will show Google which pages are more powerful).

So have you ever tried preventing Google from indexing pages without blocking it from spidering them? Does it solve the duplicate content issue anyway?

Ann Smarty

Ann Smarty

Brand amd Community Manager at Internet Marketing Ninjas
Ann Smarty is the blogger and community manager at Internet Marketing Ninjas. Ann's expertise in blogging and tools serve as a base for her writing,... Read Full Bio
Ann Smarty
Subscribe to SEJ!
Get our weekly newsletter from SEJ's Founder Loren Baker about the latest news in the industry!