How Search Engines Find Your Content

SMS Text
How Search Engines Find Your Content

Discovery; where SEO starts…

Long before you start to worry about actually ranking content, you need to get it into the index. And of course, to do that, your content needs to be found. In the world of information retrieval this is, appropriately enough, known as ‘discovery’. This in itself can be known as a few things including ‘data mining’ ‘knowledge discovery’ and so on. Regardless of the terminology, I thought it would be fun to get back to basics and look at how search engines might go about it…

Right away, I’d like to mention that this post is based on a wide variety of patents and papers that I’ve gone through over the years. This isn’t Google specific; it is more about common approaches. That being said most of these approaches I have seen from all of the big 3 (?). We should also note that discovery is a long way from ranking and referrer traffic. What is important is that one understands the many faces of discovery to make content more accessible to search engines…

M’kay? Now let’s get into it…

How search engines find your content

Traditional Discovery

First off let’s look at some of the more common methods that most of us should be aware of. Here are the ‘old school’ methods for getting content noticed by search engines;

Page Submission – First off is the granddaddy of all methods; submitting pages to the search engines. It is sad to see that in 2k10 we still have so-called ‘SEO companies’ that try to sell this bloody service. For the record, I’ve NEVER bothered with this one (see more in the ‘A word on indexation’ section of this post).

Web page links – This is the more traditional approach that most of us know. Search engines find a link and follow that link to the content. This is why links are not only important for ranking, but getting pages found as well. All of the major search engines use this approach to discovery and is easily the most common.

Site map (and submission) – One of the developments over the last few years was the addition of (XML) site maps and search engine services for submission. While this is a handy approach, sometimes search engines don’t seem to revisit them all that often and it might be better as a supplemental approach to getting found; more so than relying on it exclusively.

A new breed of discovery

Next we’re going to look at some of the more recent additions to the world of discovery in search;

RSS and Atom – along the way search engines realized that a page with no links weren’t being found easily and that some query spaces required fresher results. How can this be sorted? Well, they started indexing from RSS aggregators such as Google Reader (in Google’s case). Oh and I’d also look at PubSubHubub

Say hello to ‘social’ – another more recent method, while still technically about links, is the world of social. Search engines are getting more and more into ‘social/real-time search’ and this also provides a wealth of potential discovery angles. This augments the traditional model of contextual links or forum lurking (social 1.0).

Application focus; tin foil time

And last but not least and the core reason for writing this post is; application focus. What IS this odd term you ask? It’s simple… they go BEYOND THE WEB. In more than a few patents I’ve come across there are mentions of many different ways to find content beyond the ones mentioned so far. Some of these elements can include;

  1. Email
  2. Instant Messenger
  3. Word Processing apps
  4. Cell Phones
  5. Google Desktop
  6. Google Wave
  7. Just about ANY Microsoft app

Getting the idea here? Search engines can go beyond traditional avenues such as we’ve looked at in the first part of this article and find links to content across a wide variety of applications.

Now, is this being done? Well, the availability of methods has never been the problem; processing power was. Notice how I say ‘was’? This is because with the spectre of new technologies (the least of which being the Caffeine update) they can start to incorporate these methods even deeper.  What was once the domain of tin-foil apparel may now be coming to fruition.

Are they using them? Well, that much I don’t know as it’s not been effectively tested. But I can say that the interest in cross application focus has been on the rise in a variety of patent filings over the last few years and is certainly worth considering.

A word on indexation

Before we go, I think it is important to note there is a HUGE difference between discovery and indexation. Remember those jackasses charging to ‘submit to the search engines’? Never pay for submission as indexation requires (in most cases) some form of PageRank (link love) being passed to the page in question. This can be external links or internal links, which essentially is what we see with sites with greater authority.

Search engines decide if they are going to index and rank content based on a wide variety of factors from link love to authority and temporal query importance. Simply being discovered is a long way from actually ranking for something meaningful and bringing in targeted referrer traffic.

The main thing here was to get a better understanding of one of the first steps along the journey to the ultimate SEO goal; achieving rankings and targeted traffic.

Why should you care?

Well, DUH… because if you are going to truly call yourself a ‘Search Engine’ optimizer then it’s best you have an intimate knowledge of how ‘search engines’ work. I swear, the next time I see someone trying to charge for submission, I am gonna phreak on ‘em – Be sure that it’s not YOU!

Oh and hey, if you want a business perspective, we can also tie some doh-rae-me ($$$) to this as well. All SEO programs have budgetary limitations and knowing the easiest and most effective ways to get discovered will encourage a lean mean program at the end of the day.

The post is by David Harry aka the Gypsy; make sure to check Dave’s SEO Training Dojo.

 

David Harry
David Harry is an SEO and IR geek that runs Reliable SEO, blogs on the Fire Horse Trail and is the head geek at the... Read Full Bio
Subscribe to SEJ!
Get our weekly newsletter from SEJ's Founder Loren Baker about the latest news in the industry!