1. SEJ
  2.  ⋅ 
  3. SEO

New Data Shows Googlebot’s 2 MB Crawl Limit Is Enough

New data shows most web pages fall below Googlebot's 2 megabytes crawl limit, definitively proving that this is not something to worry about.

New Data Shows Googlebot’s 2 MB Crawl Limit Is Enough

New data based on real-world actual web pages demonstrates that Googlebot’s crawl limit of two megabytes is more than adequate. New SEO tools provide an easy way to check how much the HTML of a web page weighs.

Data Shows 2 Megabytes Is Plenty

Raw HTML is basically just a text file. For a text file to get to two megabytes it would require over two million characters.

The HTTPArchive explains what’s in the HTML weight measurement:

“HTML bytes refers to the pure textual weight of all the markup on the page. Typically it will include the document definition and commonly used on page tags such as <div> or <span>. However it also contains inline elements such as the contents of script tags or styling added to other tags. This can rapidly lead to bloating of the HTML doc.”

That is the same thing that Googlebot is downloading as HTML, just the on-page markup, not the links to JavaScript or CSS.

According to the HTTPArchive’s latest report, the real-world median average size of raw HTML is 33 kilobytes. The heaviest page weight at the 90th percentile is 155 kilobytes, meaning that the HTML for 90% of sites are less than or approximately equal to 155 kilobytes in size. Only at the 100th percentile does the size of HTML explode to way beyond two megabytes, which means that pages weighing two megabytes or more are extreme outliers.

The HTTPArchive report explains:

“HTML size remained uniform between device types for the 10th and 25th percentiles. Starting at the 50th percentile, desktop HTML was slightly larger.

Not until the 100th percentile is a meaningful difference when desktop reached 401.6 MB and mobile came in at 389.2 MB.”

The data separates the home page measurements from the inner page measurements and surprisingly shows that there is little difference between the weights of either. The data is explained:

“There is little disparity between inner pages and the home page for HTML size, only really becoming apparent at the 75th and above percentile.

At the 100th percentile, the disparity is significant. Inner page HTML reached an astounding 624.4 MB—375% larger than home page HTML at 166.5 MB.”

Mobile And Desktop HTML Sizes Are Similar

Interestingly, the page sizes between mobile and desktop versions were remarkably similar, regardless of whether HTTPArchive was measuring the home page or one of the inner pages.

HTTPArchive explains:

“The size difference between mobile and desktop is extremely minor, this implies that most websites are serving the same page to both mobile and desktop users.

This approach dramatically reduces the amount of maintenance for developers but does mean that overall page weight is likely to be higher as effectively two versions of the site are deployed into one page.”

Though the overall page weight might be higher since the mobile and desktop HTML exists simultaneously in the code, as noted earlier, the actual weight is still far below the two-megabyte threshold all the way up until the 100th percentile.

Given that it takes about two million characters to push the website HTML to two megabytes and that the HTTPArchive data based on actual websites shows that the vast majority of sites are well under Googlebot’s 2 MB limit, it’s safe to say it’s okay to scratch off HTML size from the list of SEO things to worry about.

Tame The Bots

Dave Smart of Tame The Bots recently posted that they updated their tool so that it now will stop crawling at the two megabyte limit for those whose sites are extreme outliers, showing at what point Googlebot would stop crawling a page.

Smart posted:

“At the risk of overselling how much of a real world issue this is (it really isn’t for 99.99% of sites I’d imagine), I added functionality to tamethebots.com/tools/fetch-… to cap text based files to 2 MB to simulate this.”

Screenshot Of Tame The Bots Interface

The tool will show what the page will look like to Google if the crawl is limited to two megabytes of HTML. But it doesn’t show whether the tested page exceeds two megabytes, nor does it show how much the web page weighs. For that, there are other tools.

Tools That Check Web Page Size

There are a few tool sites that show the HTML size but here are two that just show the web page size. I tested the same page on each tool and they both showed roughly the same page weight, give or take a few kilobytes.

Toolsaday Web Page Size Checker

The interestingly named Toolsaday web page size checker enables users to test one URL at a time. This specific tool just does the one thing, making it easy to get a quick reading of how much a web page weights in kilobytes (or higher if the page is in the 100th percentile).

Screenshot Of Toolsaday Test Results

Small SEO Tools Website Page Size Checker

The Small SEO Tools Website Page Size Checker differs from the Toolsaday tool in that Small SEO Tools enables users to test ten URLs at a time.

Not Something To Worry About

The bottom line about the two megabyte Googlebot crawl limit is that it’s not something the average SEO needs to worry about. It literally affects a very small percentage of outliers. But if it makes you feel better, give one of the above SEO tools a try to reassure yourself or your clients.

Featured Image by Shutterstock/Fathur Kiwon

Category News SEO
SEJ STAFF Roger Montti Owner - Martinibuster.com at Martinibuster.com

I have 25 years hands-on experience in SEO, evolving along with the search engines by keeping up with the latest ...