SEO

How Inaccurate Google’s SITE: Operator Is and How to Fix It

SITE: operator is one of my favorite advanced Google search commands. It has always been a good way to tell how many URLs (approximately) are in Google’s index, to diagnose some on-site SEO issue as well as identify a penalty.

There have been more and more people recently who are complaining about how inaccurate SITE: command actually is. People report either sudden drop in the number of results returned for site:domain.com command (with no change reported in Google Webmaster Tools) or inconsistent data (again, compared to verified Webmaster Tools account data).

Here’s what smart people say about this:

There are a growing number of people noticing the strange results from the site: operator. Bottom line for me, this report cannot be trusted any longer – most especially not the number. Just because a url is not in the site: operator result doesn’t mean it isn’t indexed and getting search traffic.

However, while agreeing to the command being largely inaccurate, people still see the effect on actual site rankings:

Anyhow, while I found “site:” to be maybe 65% reliable (I’ll expand on this) I still find the correlation (between the ‘site’ number and SERP traffic) to be fairly significant.

This forum discussion suggests the need of checking throughout a number of Google data centers before driving to any conclusions:

I think the datacenters hold all the data and that data gets filtered down to regular google. Google knows all the pages are there, it just chooses which pages are deemed more important. This has always bugged me in the past until I noticed that a lot of unique phrase searches will show a page that is not part of the 1950 that the site operator returns.

So, let’s share our experience?

Here are a few tools that will help you to check SITE: results for any of your sites to compare to regular Google:

Please check your site and share your numbers in the comments (if you have Google Webmaster Tools data, please share it as well).

Numbers for [site:SearchEngineJournal.com]:

  • Regular Google: 22, 600 (for November, 3)
  • Datacenters: range from 9,980 (!) to 23, 100

What about you?

 How Inaccurate Googles SITE: Operator Is and How to Fix It
Ann Smarty is the blogger and community manager at Internet Marketing Ninjas. Ann's expertise in blogging and tools serve as a base for her writing, tutorials and her guest blogging project, MyBlogGuest.com.
 How Inaccurate Googles SITE: Operator Is and How to Fix It

You Might Also Like

Comments are closed.

12 thoughts on “How Inaccurate Google’s SITE: Operator Is and How to Fix It

  1. What people are seeing right now appears to be Google’s normal Fall Flurries, where many pages seem to be dumped (temporarily) from the index. This has happened every fall since at least 2003, if not earlier.

    Whether it happens for the same reason each year as in previous years is a matter of speculation. I suspect the pattern is due more to a collision of random factors and timing of Google production schedules.

    The situation usually begins in October and rights itself by mid-December or early January. I have never seen Google try to explain the phenomenon. I’m not even sure if they have acknowledged it.

  2. I guess the data can be more precise for smaller sites. I checked two and got either exact same or very close results:

    site:irishwonder.com
    Regular Google: 107
    Datacenters: all 106

    site:dirguide.info
    Regular Google: 9
    Datacenters: all 9

  3. “I think the datacenters hold all the data and that data gets filtered down to regular google. ” – definitely.

    Google samples data when serving results, especially when your query requires additional segmentation (ie, one slice for “keyword”, second slice for “site:site.com”). I’ve noticed that including different portions of my URL in competing “site” queries is sometimes necessary to shake out a URL I am looking for, even though the other variations I tried should’ve returned it, too.

    With clients I always use “estimates” when using Google stats, and “reports” when using Webmaster Tools.

    I think it’s also important for Google to protect their data by sprinkling in slight variance in results. Static data sets could be reverse engineered, in theory.

  4. I always make sure to use: site:thesite.com and site:www. Thesite.com. I also agree with Mike re: fallflurries It’s just that time of year. When you think about what the operator is supposed to be reflecting according to Google it makes sense. Google’s reality is always in flux.

  5. The number shown in Google is just a estimate from Google. Based on the number of indexed pages Google estimates the total number of pages of your site. If you want to see the real number of indexed pages: browse to the last page with search results (and click ‘repeat the search with the omitted results included’ at the bottom of the page if you see this line). You will now see the real number of pages in the index.

    This only works for small sites, because for large sites it is not possible to browse all result pages. Google only lets you browse the first 1000 indexed pages.

  6. When I do a Site: search I always navigate to the end to see what the “REAL” number is.

    It’s so annoying that the first pages of Google might say 163 pages are indexed but when you click on page 17 you redirected to page 14 and REALLY only 143 pages are indexed.

    WHY CAN’T GOOGLE JUST FIX THIS?!?

  7. site:buyandwalk.com

    Regular Google: 4,080
    All Datacenters: 135,000
    Webmaster Tools: 3,945

    Huge difference !!!