1. SEJ
  2.  ⋅ 
  3. Generative AI

Google’s Mueller Says llms.txt Can’t Help LLMs Differentiate Sites

  • Mueller addressed the discovery case for llms.txt.
  • He argues that llms.txt can't help LLM systems decide which website to surface for a given query.
  • Self-reported files don't give LLMs a way to rank one site over another.

Mueller argues LLM systems can't use llms.txt to differentiate between websites for discovery. He sees a narrow role once an agent is already on a site.

Google’s Mueller Says llms.txt Can’t Help LLMs Differentiate Sites

Google’s John Mueller argued that LLM systems can’t use files like llms.txt to decide which websites to surface for a given query.

He made the comments on a recent episode of Search Off the Record, the podcast from Google’s Search Relations team.

His comment points to a broader signal problem, not just intentional gaming. Even a well-written llms.txt file is still self-reported information from the site that wants to be chosen.

For discovery, Mueller pointed back to normal HTML pages and internal links.

What Mueller Said

The conversation started with a question about whether publishers should convert websites to Markdown for LLMs. Mueller and co-host Martin Splitt agreed that HTML is still the foundation for crawling and discovery.

The discussion got specific when Mueller turned to llms.txt. He described the discovery use case as a dead end:

“It’s basically you’re telling these systems, like, I have the best website ever. And here are all of the pages that everyone must go to. And you must buy all of my products or whatever you put in there. So in LLM system, it basically, by design, can’t trust what is here as a way of differentiating between different websites.”

His argument comes down to differentiating. If sites use llms.txt to promote themselves, the files can make similar claims. An LLM deciding which site best answers a query still needs another way to differentiate between them.

What ‘By Design’ Might Mean

“By design” could mean two different things, and Mueller didn’t clarify which.

One reading is architectural. LLM systems evaluate web content and can’t use self-reported files when picking sources.

The other reading treats it as a signal problem. Self-reported signals lose value when everyone provides them. Meta keywords stopped working for the same reason. Every site stuffed them, and search engines couldn’t extract a useful ranking signal.

Both readings reach the same conclusion on discovery. But they imply different things about whether the limitation could change over time.

Where Mueller Sees A Role

Mueller didn’t reject all uses of llms.txt. He carved out one case where it could help:

“If someone is already on your website, maybe some kind of automated system is helpful.”

He used the example of an agent trying to buy a photograph from a specific site. The LLM would visit the site and look for instructions on how to complete the purchase.

The argument splits discovery from navigation. llms.txt can’t help an LLM choose which site to visit. But it could help once the agent is already there, like a store directory for someone who already walked in.

Beyond The Gaming Argument

Mueller has called building Markdown pages for bots “a stupid idea”. He’s also compared llms.txt to the keywords meta tag.

SEJ’s Roger Montti wrote that llms.txt is “inherently untrustworthy” because nothing stops site owners from adding self-serving content. SE Ranking’s analysis of 300,000 domains found no link between llms.txt adoption and citation frequency in LLM answers.

Those arguments focused on what happens when people game the files. Mueller’s podcast comment adds the nuance that there’s no mechanism within the files to help an LLM pick one site over another.

Why This Matters

The gaming argument against llms.txt has always had a counterargument available. Platforms could learn to penalize manipulation, the way search engines handled spammy structured data.

The differentiation argument leaves a harder problem. Penalizing manipulation may address abuse, but it doesn’t explain how self-reported files help an LLM choose one site over another. Your most accurate llms.txt file still can’t tell an LLM to pick your site over a competitor’s.

Looking Ahead

Standards for how agents navigate sites haven’t settled yet, Mueller acknowledged. He mentioned WebMCP alongside other file types under discussion.

None have become a standard. By his estimate, it could take six months to a year, or longer, for agentic systems to settle on a format. The discovery layer, where HTML and internal linking already work, isn’t part of that discussion.

Category News Generative AI
SEJ STAFF Matt G. Southern Senior News Writer at Search Engine Journal

See short video versions of news stories on YouTube and TikTok. Matt G. Southern is the Senior News Writer at ...