Have You Optimized Your Crawl Equity?

SMS Text

Sites with hundreds of thousands of pages, or even millions, may not ever get all their pages indexed. This is because of a concept sometimes referred to as “crawl equity”. Crawl equity refers to the fact that since crawling a site with millions of pages takes a significant amount of bandwidth for search engines, only a portion of those pages are likely to be indexed.

Nowadays search engine optimization is not just about slapping some keywords on a page and getting a bunch of inbound links. It’s developed in complexity as the web expands, more information is stored online and algorithms get more sophisticated. With this understanding, it is critical to pay just as much (if not more) attention to the back end as with the front end – at least with large sites. Richard Baxter, of SEOgadget just put out a great post surrounding the role of structured markup as it relates to the future of SEO. Just like Baxter believes (and I agree) that standards and consistency in a uniform markup will play a larger role in a search engine’s ability to rank and display relevant content, I believe that optimizing crawl equity is a critical factor in the SEO process.

So what is the goal of crawl equity optimization? To enable search engines to spend less time crawling duplicate content or empty pages and more time crawling and indexing valuable content. Google Webmaster Help has posted some tips, but I’ll break it down here as well.

Burning Crawl Equity – Common Causes

Common causes that result in engines having to unnecessarily crawl URLs mainly come down to URL structure and infinite spaces. This is because they create duplicate content and URLs that were not intended to do be indexed in the first place, ultimately leading to engines exhausting bandwidth trying to crawl them all.


  • Session IDs
  • Sorting parameters
  • Login pages
  • Contact forms
  • Pagination
  • Calendars with a “next month” or “previous month” link
  • Filtering search results
  • Broken relative links

Suggested Solutions

By addressing the common sources where crawl equity is often wasted, you will increase the likelihood that the valuable content intended for indexing in fact gets indexed.

  • Remove user-specific details from URLs by putting this information in a cookie and 301 redirecting to the source URL.
  • Eliminate categories of dynamically generated links through robots.txt. Use advanced regular expressions to deal with complex URL strings.
  • Nofollow calendar links
  • Reduce duplicate content; utilize the rel=canonical tag
  • Improve latency issues by reducing page load time


Now this is great information for all to know and address. But how do you know there is an issue in the first place? For starters, Webmaster Tools may give you a warning report as follows:


Furthermore, an inurl search command coupled with a site search command will also do the trick to help assess the gravity of the situation. In other words, how many pages are contributing to engine bloat as a result of filters?


Finally, how to track and trend the success of your efforts? Consider tracking the % of indexed pages out of the total (intended to be indexed) pages on the site over time. You may also track this at a more granular level by calculating the % of problematic pages and trending over time as issues are addressed.

Rachel Andersen works for the Portland based SEM agency Anvil Media, Inc. She has expertise in all aspects of search engine marketing and specializes in SEO for large sites. Andersen has been responsible for the development and execution of dozens of search and social marketing campaigns over her time spent with Anvil.

Rachel Freeman

Rachel Freeman

Rachel Freeman works for the Jive Software, the pioneer and leading provider of social business solutions. She has expertise in all aspects of search engine marketing and specializes in SEO and paid search for the B2B sector. Freeman has been responsible for the development and execution of countless search and social marketing campaigns over her years in the search marketing industry.
Get the latest news from Search Engine Journal!
We value your privacy! See our policy here.
  • http://www.byrnehobart.com/ Byrne

    One good model to use is that pages are either a) navigation, or b) end-content. e.g. you have product category pages, and you have product pages. For category pages, it makes sense to let search engines index categories such that every single product is in exactly one category — search and sorting parameters can then be accessed through a single nofollowed parameter, so you only bleed a tiny bit of link juice.

  • http://www.seoworkers.com John S. Britsios (Webnauts)

    Rachel you made some good points, but with one I must disagree. You said to nofollow calendar links. Google recently announced that they do not support the nofollow attribute for internal links. Or did I misunderstand something?

    • Luke

      Re. Nofollow, it is still supported, but works in a different way now, meaning it shouldn’t be used as aggressively as many were doing some months ago

      • http://www.socialsearchmarketer.com Rachel Andersen

        Agreed with Luke – still supported, just not for purposes that it has been known for in the past (PR sculpting). It’s still good to use signals to tell the search engines what’s worth following and what it not, just not in excess or for other purposes.

  • http://www.seoedge.net gudipudi

    yeah i read this post on google blog some time back , but thanks for putting this across in much simpler way ,……

    Note ; SEJ admin …..please block this jewelry guy who is spamming a lot

  • http://testchip.info/ experience196

    Let meaningful search engine index kind and each product in just first kind ,
    Google has announced recently they do not support nofollow to be suitable for the inside to join one to belong to.