SEO

How Search Engines Find Your Content

Discovery; where SEO starts…

Long before you start to worry about actually ranking content, you need to get it into the index. And of course, to do that, your content needs to be found. In the world of information retrieval this is, appropriately enough, known as ‘discovery’. This in itself can be known as a few things including ‘data mining’ ‘knowledge discovery’ and so on. Regardless of the terminology, I thought it would be fun to get back to basics and look at how search engines might go about it…

Right away, I’d like to mention that this post is based on a wide variety of patents and papers that I’ve gone through over the years. This isn’t Google specific; it is more about common approaches. That being said most of these approaches I have seen from all of the big 3 (?). We should also note that discovery is a long way from ranking and referrer traffic. What is important is that one understands the many faces of discovery to make content more accessible to search engines…

M’kay? Now let’s get into it…

Search engine discovery How Search Engines Find Your Content

Traditional Discovery

First off let’s look at some of the more common methods that most of us should be aware of. Here are the ‘old school’ methods for getting content noticed by search engines;

Page Submission – First off is the granddaddy of all methods; submitting pages to the search engines. It is sad to see that in 2k10 we still have so-called ‘SEO companies’ that try to sell this bloody service. For the record, I’ve NEVER bothered with this one (see more in the ‘A word on indexation’ section of this post).

Web page links – This is the more traditional approach that most of us know. Search engines find a link and follow that link to the content. This is why links are not only important for ranking, but getting pages found as well. All of the major search engines use this approach to discovery and is easily the most common.

Site map (and submission) – One of the developments over the last few years was the addition of (XML) site maps and search engine services for submission. While this is a handy approach, sometimes search engines don’t seem to revisit them all that often and it might be better as a supplemental approach to getting found; more so than relying on it exclusively.

A new breed of discovery

Next we’re going to look at some of the more recent additions to the world of discovery in search;

RSS and Atom – along the way search engines realized that a page with no links weren’t being found easily and that some query spaces required fresher results. How can this be sorted? Well, they started indexing from RSS aggregators such as Google Reader (in Google’s case). Oh and I’d also look at PubSubHubub

Say hello to ‘social’ – another more recent method, while still technically about links, is the world of social. Search engines are getting more and more into ‘social/real-time search’ and this also provides a wealth of potential discovery angles. This augments the traditional model of contextual links or forum lurking (social 1.0).

Application focus; tin foil time

And last but not least and the core reason for writing this post is; application focus. What IS this odd term you ask? It’s simple… they go BEYOND THE WEB. In more than a few patents I’ve come across there are mentions of many different ways to find content beyond the ones mentioned so far. Some of these elements can include;

  1. Email
  2. Instant Messenger
  3. Word Processing apps
  4. Cell Phones
  5. Google Desktop
  6. Google Wave
  7. Just about ANY Microsoft app

Getting the idea here? Search engines can go beyond traditional avenues such as we’ve looked at in the first part of this article and find links to content across a wide variety of applications.

Now, is this being done? Well, the availability of methods has never been the problem; processing power was. Notice how I say ‘was’? This is because with the spectre of new technologies (the least of which being the Caffeine update) they can start to incorporate these methods even deeper.  What was once the domain of tin-foil apparel may now be coming to fruition.

Are they using them? Well, that much I don’t know as it’s not been effectively tested. But I can say that the interest in cross application focus has been on the rise in a variety of patent filings over the last few years and is certainly worth considering.

A word on indexation

Before we go, I think it is important to note there is a HUGE difference between discovery and indexation. Remember those jackasses charging to ‘submit to the search engines’? Never pay for submission as indexation requires (in most cases) some form of PageRank (link love) being passed to the page in question. This can be external links or internal links, which essentially is what we see with sites with greater authority.

Search engines decide if they are going to index and rank content based on a wide variety of factors from link love to authority and temporal query importance. Simply being discovered is a long way from actually ranking for something meaningful and bringing in targeted referrer traffic.

The main thing here was to get a better understanding of one of the first steps along the journey to the ultimate SEO goal; achieving rankings and targeted traffic.

Why should you care?

Well, DUH… because if you are going to truly call yourself a ‘Search Engine’ optimizer then it’s best you have an intimate knowledge of how ‘search engines’ work. I swear, the next time I see someone trying to charge for submission, I am gonna phreak on ‘em – Be sure that it’s not YOU!

Oh and hey, if you want a business perspective, we can also tie some doh-rae-me ($$$) to this as well. All SEO programs have budgetary limitations and knowing the easiest and most effective ways to get discovered will encourage a lean mean program at the end of the day.

The post is by David Harry aka the Gypsy; make sure to check Dave’s SEO Training Dojo.

 

 How Search Engines Find Your Content
David Harry is an SEO and IR geek that runs Reliable SEO, blogs on the Fire Horse Trail and is the head geek at the SEO Training Dojo.

Comments are closed.

14 thoughts on “How Search Engines Find Your Content

  1. Good stuff as always, Harry, although I would go as far as to say that XML sitemaps are fool’s gold in terms of getting content indexed by Google.

    URLs that are orphaned (e.g. have no inbound links from other URLs that are themselves indexed by Google) will not get indexed even if they are added to an XML sitemap. I know this from personal experiences and Google more or less says that in their guidelines on XML sitemap submission (e.g. it will not override their standard indexing methodology).

    Keep the content comin’!

  2. “Well, DUH”? That’s actually what I say about search engine submission. I try to explain it like this.

    If you submit to the search engine, sure you get indexed, but you don’t have enough link juice to rank for anything. By the time you have enough link juice, the search engines have long discovered you anyway. Submitting to search engines just wastes the time you could spend submitting to a directory or pitching a blogger or writing an article, so it actually delays your ranking.

  3. Hey Hugo, good to cya and thanks as always.

    “fool’s gold in terms of getting content indexed by Google.”

    There’s the thing, discovery and indexation are two different things actually. Indexation decision making is generally on a variety of elements such as links, site authority and so forth. Just because they discovered it, doesn’t mean they’ll index it… As noted with;

    “I think it is important to note there is a HUGE difference between discovery and indexation.”

    So yes, I agree with the assertion and statement from Ol Google. I think the two main things I wanted to get at was to think outside the box (application focus methods) and establish that a more refined discovery program can help save some time…

  4. @David…lol.. I hear ya on that one. Can’t count the number of times I’ve told people that submission is useless. As noted in the above comment to Hugo, each search engine has it’s own criteria as far as what get’s indexed. And really, submitting does squat as far as rankings go… the act of doing SEO to RANK will obviously accomplish the task of discovery/indexation as well… So submission is pointless.

    We have a poll on the Dojo right now on ‘SEO Myths’ and last I checked ‘submissions’ was number 2 ….sooooo…. def a dumb ass concept. :0)

  5. What’s really scary (as I’m sure both of you know) is that there are still a lot of “experienced” marketers out there -even at Fortune 500 companies – that still think submissions and XML sitemaps are the answer to SEO needs.

  6. I stopped submitting sites the old-school way ages ago, though I do start things off by setting up GWT and adding the sitemap.xml file, simply because it takes all of 30 seconds to add the link to it, and I’ve seen direct quick indexing as a result when it comes to deeper content. At the same time though, when other methods aren’t implemented, I’ve also seen many of those initially indexed pages drop right out of the results.

  7. I agree search engines like google, yahoo and bing are changing their algorithms every year because everyone needs change and the easiest ways to grow and i think whatever they are doing is for growing. Not only they are growing also those people are growing who are following the latest trends of SEO. Social Media brings new change to Search Engine Optimization.

  8. As we are aware, inbound links are one of the hottest aspects of optimisation. Not providing your pages with quality links is going to result in pages not getting indexed anyway. By using a XML sitemap it helps pages get indexed and also gives good information on any websites indexing status.

    As for website submissions I think it’s agreed that it doesn’t have a massive affect either way, it takes a very short amount of time to submit this information and therefore may be worth doing.

  9. As we are aware, inbound links are one of the hottest aspects of optimisation. Not providing your pages with quality links is going to result in pages not getting indexed anyway. By using a XML sitemap it helps pages get indexed and also gives good information on any websites indexing status.

    As for website submissions I think it’s agreed that it doesn’t have a massive affect either way, it takes a very short amount of time to submit this information and therefore may be worth doing.

  10. Interesting- there are other Mat Cutt’s quotes indicating they do put store in social signals- It’s very difficult to know what to believe on this front, but I certainly agree with you that +1′s aren’t a significant factor in the algorithm at the present time. Your point about using social signals or likes as a means of finding fresh content is well made.