Indexation. Ensuring that all your site’s important pages are indexed is absolutely at the root of a sound SEO strategy. More often than not, websites with thousands of pages run the risk of not being able to get all their pages indexed. What’s more is that if this is you, you might not even know it. Why? If you’ve got a huge site, chances are that traffic levels are decent, thus making it difficult to identify such an issue.
Simply put, if your pages aren’t indexed they will never rank in Google. But why would these pages not get indexed?
The number of pages indexed is roughly proportional to your PageRank
Wait, what? I thought PageRank was so 2003. Actually, not so. In a March 2010 interview by Eric Enge, Matt Cutts squashes the commonly held theory that every site has a dedicated “crawl budget” and instead confirms that PageRank might actually play a larger role in terms of indexation.
Typically, the pages buried deep within a site’s architecture will most likely be impacted with indexation problems. These are often product pages and older articles that may even be hard to find on the site itself (think past articles on newspaper sites). This represents a significant lost opportunity for getting traffic from long tail search queries.
Google’s “Mayday” update seems to also confirm this.
This change seems to have primarily impacted very large sites with “item” pages that don’t have many individual links into them, might be several clicks from the home page, and may not have substantial unique and value-added content on them. For instance, ecommerce sites often have this structure. The individual product pages are unlikely to attract external links and the majority of the content may be imported from a manufacturer database. –Vanessa Fox on SearchEngineLand
Back in 2009 I wrote about the concept of crawl equity and while those best practices still ring true, it didn’t factor in PageRank. Understanding that PageRank (or lack thereof) can prohibit indexation really just means that the affected pages need more links – both internal and external. The key takeaway here is to formulate a strategy to obtain deep links into your website for better indexation. In addition, develop an architecture strategy that allows for those pages that currently live deep within the site to be more accessible with just a few clicks away from the home page. Furthermore, cleaning up duplicate content will help to ensure that links are not spread between three different versions of the same piece of content but instead aggregated to strengthen the value of that page.
Of course, best practices like creating XML sitemaps and utilizing your robots.txt file to disallow problematic pages will help engines spend more time crawling all pages. With that said however, in order to be the most effective it might be time to revisit your PageRank.