How Inaccurate Google’s SITE: Operator Is and How to Fix It

SMS Text

SITE: operator is one of my favorite advanced Google search commands. It has always been a good way to tell how many URLs (approximately) are in Google’s index, to diagnose some on-site SEO issue as well as identify a penalty.

There have been more and more people recently who are complaining about how inaccurate SITE: command actually is. People report either sudden drop in the number of results returned for command (with no change reported in Google Webmaster Tools) or inconsistent data (again, compared to verified Webmaster Tools account data).

Here’s what smart people say about this:

There are a growing number of people noticing the strange results from the site: operator. Bottom line for me, this report cannot be trusted any longer – most especially not the number. Just because a url is not in the site: operator result doesn’t mean it isn’t indexed and getting search traffic.

However, while agreeing to the command being largely inaccurate, people still see the effect on actual site rankings:

Anyhow, while I found “site:” to be maybe 65% reliable (I’ll expand on this) I still find the correlation (between the ‘site’ number and SERP traffic) to be fairly significant.

This forum discussion suggests the need of checking throughout a number of Google data centers before driving to any conclusions:

I think the datacenters hold all the data and that data gets filtered down to regular google. Google knows all the pages are there, it just chooses which pages are deemed more important. This has always bugged me in the past until I noticed that a lot of unique phrase searches will show a page that is not part of the 1950 that the site operator returns.

So, let’s share our experience?

Here are a few tools that will help you to check SITE: results for any of your sites to compare to regular Google:

Please check your site and share your numbers in the comments (if you have Google Webmaster Tools data, please share it as well).

Numbers for []:

  • Regular Google: 22, 600 (for November, 3)
  • Datacenters: range from 9,980 (!) to 23, 100

What about you?

Ann Smarty

Ann Smarty

Brand amd Community Manager at Internet Marketing Ninjas
Ann Smarty is the blogger and community manager at Internet Marketing Ninjas. Ann's expertise in blogging and tools serve as a base for her writing,... Read Full Bio
Ann Smarty
Get the latest news from Search Engine Journal!
We value your privacy! See our policy here.
  • Mike Wilton
    Regular Google: 94
    Datacenters: Range from 100 to 233

  • JuiceeLinks

    A customers website had the following:

    Regular Google: 169
    Webmasters Tools Indexed Pages: 106
    Datacenters: 169 to 194

  • Michael Martinez

    What people are seeing right now appears to be Google’s normal Fall Flurries, where many pages seem to be dumped (temporarily) from the index. This has happened every fall since at least 2003, if not earlier.

    Whether it happens for the same reason each year as in previous years is a matter of speculation. I suspect the pattern is due more to a collision of random factors and timing of Google production schedules.

    The situation usually begins in October and rights itself by mid-December or early January. I have never seen Google try to explain the phenomenon. I’m not even sure if they have acknowledged it.

  • IrishWonder

    I guess the data can be more precise for smaller sites. I checked two and got either exact same or very close results:
    Regular Google: 107
    Datacenters: all 106
    Regular Google: 9
    Datacenters: all 9

  • Derek Clark

    “I think the datacenters hold all the data and that data gets filtered down to regular google. ” – definitely.

    Google samples data when serving results, especially when your query requires additional segmentation (ie, one slice for “keyword”, second slice for “”). I’ve noticed that including different portions of my URL in competing “site” queries is sometimes necessary to shake out a URL I am looking for, even though the other variations I tried should’ve returned it, too.

    With clients I always use “estimates” when using Google stats, and “reports” when using Webmaster Tools.

    I think it’s also important for Google to protect their data by sprinkling in slight variance in results. Static data sets could be reverse engineered, in theory.

  • Tinu

    I always make sure to use: and site:www. I also agree with Mike re: fallflurries It’s just that time of year. When you think about what the operator is supposed to be reflecting according to Google it makes sense. Google’s reality is always in flux.

  • Roland

    The number shown in Google is just a estimate from Google. Based on the number of indexed pages Google estimates the total number of pages of your site. If you want to see the real number of indexed pages: browse to the last page with search results (and click ‘repeat the search with the omitted results included’ at the bottom of the page if you see this line). You will now see the real number of pages in the index.

    This only works for small sites, because for large sites it is not possible to browse all result pages. Google only lets you browse the first 1000 indexed pages.

  • Kevin Pike

    When I do a Site: search I always navigate to the end to see what the “REAL” number is.

    It’s so annoying that the first pages of Google might say 163 pages are indexed but when you click on page 17 you redirected to page 14 and REALLY only 143 pages are indexed.


  • Fred

    Regular Google: 4,080
    All Datacenters: 135,000
    Webmaster Tools: 3,945

    Huge difference !!!

  • Daniel

    Regular Google: 1,300
    Datacenters: 30,100

    Wow! What a difference.

  • Chris

    It’s also amazing how different the results are at Yahoo and Bing for the site: operator command.

  • aldemon

    Is this still valid? All I see is Connection timeout.