In Exposing the Invisible Web to Search Engines, I mentioned that new blogs often have pages that are essentially invisible, and that many remain that way. One reader strongly disagreed, to the point of saying on Digg that he was burying the story for being inaccurate. The fact is, under the definition of Invisible Web, any page not indexed is invisible. While blog pages MAY get indexed, they do not have to be.
Disproof by example is simple enough: I write on a couple of medium volume PR6 blogs which, even after nine months, have not indexed several of my posts. I make a habit of deep-linking to two or three archived posts in each new article I post. This is a necessity, because my hypothesis is that under normal conditions (i.e., not deep-linking), most new blogs will never have all of their pages indexed.
I know this will be a touchy point. My entire intent is to suggest that deep-linking is a good idea for numerous reasons:
- Expose relevant content.
Help readers find older, relevant content when they view an article. If they click through, this increases the time they spend on the site, helping your brand.
- Build relevance.
Assign importance to archived posts, as far as search engines are concerned.
- Track scrapers.
When your content is republished elsewhere, having deep-links increases the chance that you’ll find the duplicate content.
- Build keyword rank.
Help your blog rank for additional keyphrases by using good anchor text and deep-links.
- Spider bait.
Deep-linking means spiders will crawl deeper.
I have used this technique to not only successfully get alternate terms to rank for a site but also to build the PR (PageRank) of archived pages whose content was otherwise only indexed as part of “category” pages or monthly archives. Here is a short sequence that shows the internal link structure of a hypothetically typical blog under normal conditions (i.e., not deep-linking)
Please note the following:
- The purple node marked “hp” is the home page.
- The blue nodes represent “/page/n” types of pages that you find in blog platforms such as WordPress. Thus, the node-link structures above do not apply to all blogs.
- The green nodes represent article pages (as opposed to the non-chronological “pages” found in WordPress).
- In frames 4-9, posts A-F actually link to the home page. These links are not shown to reduce clutter.
- Posts A-I may or may not interlink with each other. If they do, it’s likely because of a “recent posts” blog.
- Pages p2-p3 may or may not have a link to each from the home page.
- Page p1 in this example is actually a clone of the home page, and is used for convenience. Depending on how a blog is configured, p1 may either (a) be the same as the home page; (b) may not exist; (c) may exist but serve the functionality of a “page 2”.
- Each page p1-p3 and the home page only display three posts maximum, purely for convenience of diagramming.
- The concepts here can be extrapolated to other blog platforms to a degree.
The nodes in the animated sequence are minimally “connected” and thus article pages are less likely to be indexed. The content itself gets indexed, but usually as part of “/page/1”, “/page/2”, or “category” or monthly archive pages. These are all transient pages, so even though the content is indexed, it may be difficult to find the real page in a search engine. I find this happens to me regularly even on PR6 sites, even if I use very specific search phrases with double quotes. But once I deep-link, the archived page gets indexed sooner, and the page rank rise – though probably due to several factors, not just deep-linking alone.
Suffice it to say that even if you disagree with me and believe your blog pages are not technically “invisible” and will eventually get linked under normal circumstances, you should still build deep-links to your archived pages.