Have You Optimized Your Crawl Equity?

August 24, 2009
⋅
4 min read

Rachel Freeman Anvil Media, Inc

1.9K

READS

Sites with hundreds of thousands of pages, or even millions, may not ever get all their pages indexed. This is because of a concept sometimes referred to as “crawl equity”. Crawl equity refers to the fact that since crawling a site with millions of pages takes a significant amount of bandwidth for search engines, only a portion of those pages are likely to be indexed.

Nowadays search engine optimization is not just about slapping some keywords on a page and getting a bunch of inbound links. It’s developed in complexity as the web expands, more information is stored online and algorithms get more sophisticated. With this understanding, it is critical to pay just as much (if not more) attention to the back end as with the front end – at least with large sites. Richard Baxter, of SEOgadget just put out a great post surrounding the role of structured markup as it relates to the future of SEO. Just like Baxter believes (and I agree) that standards and consistency in a uniform markup will play a larger role in a search engine’s ability to rank and display relevant content, I believe that optimizing crawl equity is a critical factor in the SEO process.

So what is the goal of crawl equity optimization? To enable search engines to spend less time crawling duplicate content or empty pages and more time crawling and indexing valuable content. Google Webmaster Help has posted some tips, but I’ll break it down here as well.

Burning Crawl Equity – Common Causes

Common causes that result in engines having to unnecessarily crawl URLs mainly come down to URL structure and infinite spaces. This is because they create duplicate content and URLs that were not intended to do be indexed in the first place, ultimately leading to engines exhausting bandwidth trying to crawl them all.

Examples:

Session IDs
Sorting parameters
Login pages
Contact forms
Pagination
Calendars with a “next month” or “previous month” link
Filtering search results
Broken relative links

Diagnostics

Now this is great information for all to know and address. But how do you know there is an issue in the first place? For starters, Webmaster Tools may give you a warning report as follows:

Furthermore, an inurl search command coupled with a site search command will also do the trick to help assess the gravity of the situation. In other words, how many pages are contributing to engine bloat as a result of filters?

Finally, how to track and trend the success of your efforts? Consider tracking the % of indexed pages out of the total (intended to be indexed) pages on the site over time. You may also track this at a more granular level by calculating the % of problematic pages and trending over time as issues are addressed.

Rachel Andersen works for the Portland based SEM agency Anvil Media, Inc. She has expertise in all aspects of search engine marketing and specializes in SEO for large sites. Andersen has been responsible for the development and execution of dozens of search and social marketing campaigns over her time spent with Anvil.

Category SEO

The Ultimate Topic Cluster Cheat Sheet & Checklist Bundle

The New SEO Playbook: How AI Is Reshaping Search & Content

The Hidden Cost Of Google Ads: Stop Wasting Budget Bidding Against Yourself

The New SEO Playbook: How AI Is Reshaping Search & Content

Social Media Planner: How To Plan Your Content (With Template)

The Hidden Cost Of Google Ads: Stop Wasting Budget Bidding Against Yourself

Have You Optimized Your Crawl Equity?

Burning Crawl Equity – Common Causes

Suggested Solutions

Diagnostics