Very often, especially with huge dynamically driven or user-generated websites, you might feel you have some indexing problems, i.e. Google doesn’t seem to dig your site as deep as you want it to. The remedies to this may vary but here are some essential basics you might want to look into:
- Make sure you really have indexing problems. For some reason, in many cases, Google site: operator won’t show you the real number how many URLs the search engine has indexed. You should thoroughly explore how deep your site has been indexed in reality before arriving at any conclusions. Here are some tips to do that (also mentioned at SEOmoz):
- check how many pages are indexed in each directory (or subdirectory: the deeper you dig, the more accurate the results are): site:yoursite.com/subdirectory/sub-subdirectory1 + site:yoursite.com/subdirectory/sub-subdirectory2, etc.
- search for subdirectory-specific keywords: site:yoursite.com inurl:subdirectory (or site:yoursite.com intitle:subdirectory);
- check recently indexed pages (take advantage of “date range” option via advanced search).
Having chosen any of these methods, carefully collect the data and in the end you will get a more accurate number of website indexed pages.
- Try to identify any non-indexing patterns. Which type(s) of pages or subdirectories are left out without attention: try to sort out any non-indexing rules or logic? By doing this you will be able to determine the issues that must have caused the indexing issues: duplicate content probably or incorrect internal architecture. Try to draw your site main interlinking structure and find pages that get inadequate link juice.
- Work on your site external deep linking ratio: deep link to your subdirectories from external resources (quality deep linking directories should still help with that).
- Mind your site crawl rate: like I said in my post on improving crawl rate, Googlebot “works on a budget”: if you keep it busy crawling huge files or waiting for your page to load or following duplicate content URLs, you might be missing the chance to show it your other pages.