Google’s Gary Illyes and Martin Splitt used an episode of the Search Off the Record podcast to walk through how Google’s crawler handles HTML. The conversation revealed differences between how browsers and Googlebot process the same page.

The discussion covered resource hints, metadata placement, and HTML validation. Several of Illyes’ explanations challenge assumptions about which technical changes help with search.

Why Resource Hints Don’t Help Googlebot

Browser performance features like dns-prefetch , preload , prefetch , and preconnect solve latency problems that Google’s infrastructure doesn’t have.

Illyes said Google’s DNS resolution doesn’t need the help most sites are trying to provide.

He stated:

“It’s very helpful if you have like a crappy internet to do DNS Prefetching for example. In our case, we don’t need to because we can talk very fast to all the cascading DNS servers.”

He added that Google caches page resources separately and doesn’t fetch them in real time the way a browser does. Illyes said Google does this to reduce bandwidth and server load on the sites it crawls.

Illyes said:

“Same with preload. If we are not synchronous then we don’t particularly need to listen and look at preload.”

Google uses the Speculation Rules API to speed up search result clicks for Chrome users. That system works because it operates at the browser level, where latency between a user and a server matters. Googlebot operates from inside Google’s own infrastructure, where those bottlenecks don’t exist.

Both Illyes and Splitt were clear that these hints still help users. Faster page loads improve retention and conversion. The difference is these changes impact the browser experience, not crawling or indexing.

Metadata Belongs In The Head

Splitt shared a case where a spec-compliant script tag in the head injected an iframe, which triggered the browser’s head-closing behavior. That pushed hreflang link tags into the body, where Splitt said Google’s systems correctly ignored them.

Illyes explained why Google is strict about this. A meta name="robots" tag, according to the HTML living standard, can only appear in the head. The same applies to rel=canonical link elements.

He said:

“I would argue that it’s really quite dangerous to have link elements that carry metadata in the body.”

His reasoning is that if Google accepted canonical tags in the body, it would be possible to hijack that page’s canonical and remove it from search results by injecting markup.

Illyes previously offered guidance on HTML parsing and rel-canonical implementation, advising spelling out the full URL path in canonical tags to avoid parser ambiguity. That’s the same idea hear, clear placement in the head removes the guesswork.

HTML Validity Doesn’t Equal Ranking Advantage

Illyes was direct about why valid HTML can’t be a ranking signal. Validity as binary, meaning it’s eiteher valid or it isn’t with no room in between. Illyes said it’s hard to do anything meaningful with a pass/fail metric.

“It’s very hard to say that something is close to valid. And then like what do you do there when something is just close to valid.”

He gave an example that a missing closing span tag makes a page’s HTML technically invalid, but as Illyes put it, “It’ll not change anything for the user.”

Splitt agreed, noting that semantic markup like proper heading hierarchy and HTML5 structural elements doesn’t carry meaningful weight for search engines either, though it’s useful for accessibility and user experience.

Why This Matters

Technical audits may flag resource hint opportunities and HTML validation errors. Knowing which of those affect Google’s crawler and which affect browsers can help you prioritize what to fix.

When hreflang tags, canonical links, or meta robots directives aren’t working as expected, the first place to check is whether they’re ending up in the body after the browser parses the page. A tag that looks correct in your source HTML can end up in the wrong location if a script or iframe triggers early head closure.

Roger Montti covered Google’s updated crawler caching guidance, which recommends ETag headers to reduce unnecessary crawling. That guidance is consistent with what Illyes described in this episode.

Looking Ahead

Splitt mentioned that client hints were the original topic he wanted to cover, and that the HTML parsing discussion was groundwork for a future episode. If that episode happens, it could cover how Googlebot handles the newer Accept-CH and Sec-CH-UA headers that are replacing traditional user agent strings.

The full conversation is available on YouTube and Apple Podcasts.