Google Explains Why It Doesn’t Matter That Websites Are Getting Larger

Google explains why it doesn't matter that websites are getting heavier and the reason has everything to do with SEO.

5 seconds ago
⋅
10 min read

SEJ STAFF Roger Montti Owner - Martinibuster.com at Martinibuster.com

Google Explains Why It Doesn’t Matter That Websites Are Getting Larger

A recent podcast by Google called attention to the fact that websites are getting larger than ever before. Google’s Gary Illyes and Martin Splitt explained that the idea that websites are getting “larger” is a bad thing is not necessarily true. The takeaway for publishers and SEOs is that Page Weight is not a trustworthy metric because the cause of the “excess” weight might very well be something useful.

Page Size Depends On What ‘s Being Measured

Google’s Martin Splitt explained that what many people think of as page size depends on what is being measured.

Is it measured by just the HTML?
Or are you talking about total page size, including images, CSS, and JavaScript?

It’s an important distinction. For example, many SEOs were freaked out when they heard that Googlebot was limiting their page crawl to just 2 megabytes of HTML per page. To put that into perspective, two megabytes of HTML equals about two million characters (letters, numbers, and symbols). That’s the equivalent of one HTML page with the same number of letters as two Harry Potter books.

But when you include CSS, images, and JavaScript along with the HTML, now we’re having a different conversation that’s related to page speed for users, not for the Googlebot crawler.

Martin discussed an article on HTTPArchive’s Web Almanac, which is a roundup of website trends. The article appeared to be mixing up different kinds of page weight, and that makes it confusing because there are at least two versions of page weight.

He noted:

“See that’s where I’m not so clear about their definition of page weight.

…they have a paragraph where they are trying to like explain what they mean by page weight. …I don’t understand the differences in what these things are. So they say page weight (also called page size) is the total volume of data measured in kilobytes or megabytes that a user must download to view a specific page. In my book that includes images and whatnot because I have to download that to see.

And that’s why I was surprised to hear that in 2015 that was 845 kilobytes. That to me was surprising. …Because I would have assumed that with images it would be more than 800 kilobytes.

… In July 2025, the same median page is now 2.3 megabytes.”

Data Gets Compressed

But that is only one way to understand page size. Another way to consider page size is by focusing on what is transferred over the network, which can be smaller due to compression. Compression is an algorithm on the server side that minimizes the size of the file that is sent from the server and downloaded by the browser. Most servers use a compression algorithm called Brotli.

Martin Splitt explains:

“I ask this question publicly that different people had very different notions of how they understood page size. Depending on the layer you are looking at, it gets confusing as well
because there’s also compression.

…So some people are like, ah, but this website downloads 10 megabytes onto my disk.

And I’m like, yes. …but maybe if you look at what actually goes over the wire, you might find that this is five or six megabytes, not the whole 10 megabytes. Because you can compress things on the network level and then you decompress them on the client side level…”

Technically, the page size in Martin’s example is actually five or six megabytes because of compression, and it’s able to download faster. But on the user’s side, that five or six megabytes gets decompressed, and it turns back into ten megabytes, which occupies that much space on a user’s phone, desktop, or wherever.

And that introduces an ambiguity. Is your web page ten megabytes or five megabytes?

That illustrates a wider problem: different people are talking about different things when they talk about page size.

Even widely used definitions don’t fully resolve the ambiguity. Page weight is described as “the total volume of data measured in kilobytes or megabytes that a user must download,” but as the discussion makes clear, there is no one clear definition.

Martin asserts:

“When you ask people what they think, if this is big or not, you start getting very different answers depending on how they think about page size. And there is no one true definition of it.”

What About Ratio Of Markup To Content?

One of the most interesting distinctions made in the podcast is that a large page is not necessarily inefficient. For example, a 15 MB HTML document is considered acceptable because “pretty much most of these 15 megabytes are actually useful content.” The size reflects the value being delivered.

By contrast, what if the ratio of content to markup were the other way around, where there was a little bit of content but the overwhelming amount of the page weight was markup.

Martin discussed the ratio example:

“…what if the markup is the only overhead? And I mean like what do you mean? It’s like, well, you know, if it’s like five megabytes but it’s only very little content, is that bad? Is that worse as in this case, the 15 megabytes.

And I’m like, that’s tricky because then we come into this weird territory of the ratio between content and markup. Yeah.

And I said, well, but what if a lot of it is markup that is metadata for some third party tool or for some service or for regulatory reasons or licensing reasons or whatever. Then that’s useful content, but not necessarily for the end user, but you still kind of have to have it.

It would be weird to say that that is worse than the page where the weight is mostly content.”

What Martin is doing here is shifting the idea of page weight away from raw size toward what the data actually represents.

Why Pages Include Data Users Never See

A major contributor to page weight is content that users never see.

Gary Illyes points to structured data as an example of content that is specifically meant for machines and not for users. While it can be useful for search engines, it also adds to the overall size of the page. If a publisher adds a lot of structured data to their page in order to take advantage of all the different options that are available, that’s going to add to the page size even though the user will never see it.

This calls attention to a structural reality of the web: pages are not just built for human readers. They are also built for search engines, tools, AI agents, and other systems, all of which add their own requirements to the weight of a web page.

When Overhead Is Justified

Not all non-user-facing content is unnecessary.

Martin talked about how markup may include “metadata” or a tool, regulatory, or licensing purpose, creating a kind of gray area. Even if the additional data does not improve the user experience directly, it does serve a purpose, including helping the user find the page through a search engine.

The point that Martin was getting at is that these considerations of page weight complicate attempts to label page weight as good if it is under this threshold or bad if the page weight exceeds it.

Why Separating Content and Metadata Doesn’t Work

One possible solution that Gary Illyes discussed is separating human-facing content from machine-facing data. While Gary didn’t specifically mention the LLMs.txt proposal, what he discussed kind of resembles it in that it serves content to a machine minus all the other overhead that goes with the user-facing content.

What he actually discussed was a way to separate all of the machine-facing data from what the user will download, thus, in theory, making the user’s version of a web page smaller.

Gary quickly dismisses that idea as “utopic” because there will always be hordes of spammers who will find a way to take advantage of that.

He explained:

“But then unfortunately this is an utopic thing. Because not everyone on the internet is playing nice.

We know how much spam we have to deal with. On our blog we say somewhere that we catch like 40 billion URLs per day that’s spam or some insane number, I don’t remember exactly, but it’s some insane number and definitely billions. That will just exacerbate the amount of spam that search engines receive and other machines receive maybe like I would bet $1 and 5 cents that will actually increase the amount of spam that search engines and LLMs and others ingest.”

Gary also said that Google’s experience is that, historically, when you have separate kinds of content, there will always be differences between the two kinds. He used the example of when websites had mobile and desktop pages, where the two versions of content were generally different, which in turn caused issues for search and also for usability when a site ranks a web page for content on one version of a page, then sends the user to a different version of the page where that content does not exist.

Although he didn’t explicitly mention it, that explanation of Google’s experience may shed more light on why Google will not adopt LLMS.txt.

As a result, search engines have largely settled on a single-document model, even if it is inefficient.

Website Size vs Page Size Is the Real World

The discussion ultimately challenges the original concept of the problem, that heavy web pages are bad.

Gary observes:

“The first question is, are websites getting fat? I think this question is not even meaningful.

Because it does not matter in the context of a website if it’s fat. In the context of a single page, yes.

But in the context of a website, it really doesn’t matter.”

So now Gary and Martin change the focus to web pages that are getting heavier, a more meaningful way to look at the issue of how web pages and websites are evolving.

This moves the discussion from an abstract idea to something more measurable and actionable.

Heavier Pages Still Carry Real Costs

Even with faster connections and better infrastructure, larger pages still have consequences, and smaller weighted pages have positive benefits.

Martin explains:

“I think we are wasting a lot of resources. And I mean we, we had that in another episode where we said that we know that there are studies that show that websites that are faster have better retention and better conversion rates. Yeah. And speed is in part also based on size. Because the more data I ship, the longer it takes for the network to actually transfer that data and the longer it takes for the processor of whatever device you’re on to actually process it and display it to you.”

From a broader perspective, the issue is not just performance but efficiency. As Illyes puts it, “we are wasting a lot of resources.”

The web may be getting heavier, but the more important takeaway is why. Pages are carrying more than just user-facing content, and that design choice shapes both their size and their impact.

Featured Image by Shutterstock/May_Chanikran

Category News SEO

ChatGPT vs. Perplexity vs. Gemini: Which LLMs Are Driving Real Conversions?

Vibe Code Tools That Solve Your SEO Problems

ChatGPT vs. Perplexity vs. Gemini: Which LLMs Are Driving Real Conversions?

The New Publishing Standard in the AI Era

ChatGPT vs. Perplexity vs. Gemini: Which LLMs Are Driving Real Conversions?

The New Publishing Standard in the AI Era

Google Explains Why It Doesn’t Matter That Websites Are Getting Larger

Page Size Depends On What ‘s Being Measured

Data Gets Compressed

What About Ratio Of Markup To Content?

Why Pages Include Data Users Never See

When Overhead Is Justified

Why Separating Content and Metadata Doesn’t Work

Website Size vs Page Size Is the Real World

Heavier Pages Still Carry Real Costs