1. SEJ
  2.  ⋅ 
  3. SEO

llms.txt: The Web’s Next Great Idea, Or Its Next Spam Magnet

A look at the promise, the risk, and the platform hesitation.

llms.txt: The Web’s Next Great Idea, Or Its Next Spam Magnet

At a recent conference, I was asked if llms.txt mattered. I’m personally not a fan, and we’ll get into why below. I listened to a friend who told me I needed to learn more about it as she believed I didn’t fully understand the proposal, and I have to admit that she was right. After doing a deep dive on it, I now understand it much better. Unfortunately, that only served to crystallize my initial misgivings. And while this may sound like a single person disliking an idea, I’m actually trying to view this from the perspective of the search engine or the AI platform. Why would they, or why wouldn’t they, adopt this protocol? And that POV led me to some, I think, interesting insights.

We all know that search is not the only discovery layer anymore. Large-language-model (LLM)-driven tools are rewriting how web content is found, consumed, and represented. The proposed protocol, called llms.txt, attempts to help websites guide those tools. But the idea carries the same trust challenges that killed earlier “help the machine understand me” signals. This article explores what llms.txt is meant to do (as I understand it), why platforms would be reluctant, how it can be abused, and what must change before it becomes meaningful.

Image Credit: Duane Forrester

What llms.txt Hoped To Fix

Modern websites are built for human browsers: heavy JavaScript, complex navigation, interstitials, ads, dynamic templates. But most LLMs, especially at inference time, operate in constrained environments: limited context windows, single-pass document reads, and simpler retrieval than traditional search indexers. The original proposal from Answer.AI suggests adding an llms.txt markdown file at the root of a site, which lists the most important pages, optionally with flattened content so AI systems don’t have to scramble through noise.

Supporters describe the file as “a hand-crafted sitemap for AI tools” rather than a crawl-block file. In short, the theory: Give your site’s most valuable content in a cleaner, more accessible format so tools don’t skip it or misinterpret it.

The Trust Problem That Never Dies

If you step back, you discover this is a familiar pattern. Early in the web’s history, something like the meta keywords tag let a site declare what it was about; it was widely abused and ultimately ignored. Similarly, authorship markup (rel=author, etc) tried to help machines understand authority, and again, manipulation followed. Structured data (schema.org) succeeded only after years of governance and shared adoption across search engines. llms.txt sits squarely inside this lineage: a self-declared signal that promises clarity but trusts the publisher to tell the truth. Without verification, every little root-file standard becomes a vector for manipulation.

The Abuse Playbook (What Spam Teams See Immediately)

What concerns platform policy teams is plain: If a website publishes a file called llms.txt and claims whatever it likes, how does the platform know that what’s listed matches the live content users see, or can be trusted in any way? Several exploit paths open up:

  1. Cloaking through the manifest. A site lists pages in the file that are hidden from regular visitors or behind paywalls, then the AI tool ingests content nobody else sees.
  2. Keyword stuffing or link dumping. The file becomes a directory stuffed with affiliate links, low-value pages, or keyword-heavy anchors aimed at gaming retrieval.
  3. Poisoning or biasing content. If agents trust manifest entries more than the crawl of messy HTML, a malicious actor can place manipulative instructions or biased lists that affect downstream results.
  4. Third-party link chains. The file could point to off-domain URLs, redirect farms, or content islands, making your site a conduit or amplifier for low-quality content.
  5. Trust laundering. The presence of a manifest might lead an LLM to assign higher weight to listed URLs, so a thin or spammy page gets a boost purely by appearance of structure.

The broader commentary flags this risk. For instance, some industry observers argue that llms.txt “creates opportunities for abuse, such as cloaking.” And community feedback apparently confirms minimal actual uptake: “No LLM reads them.” That absence of usage ironically means fewer real-world case studies of abuse, but it also means fewer safety mechanisms have been tested.

Why Platforms Hesitate

From a platform’s viewpoint, the calculus is pragmatic: New signals add cost, risk, and enforcement burden. Here’s how the logic works.

First, signal quality. If llms.txt entries are noisy, spammy, or inconsistent with the live site, then trusting them can reduce rather than raise content quality. Platforms must ask: Will this file improve our model’s answer accuracy or create risk of misinformation or manipulation?

Second, verification cost. To trust a manifest, you need to cross-check it against the live HTML, canonical tags, structured data, site logs, etc. That takes resources. Without verification, a manifest is just another list that might lie.

Third, abuse handling. If a bad actor publishes an llms.txt manifest that lists misleading URLs which an LLM ingests, who handles the fallout? The site owner? The AI platform? The model provider? That liability issue is real.

Fourth, user-harm risk. An LLM citing content from a manifest might produce inaccurate or biased answers. This just adds to the current problem we already face with inaccurate answers and people following incorrect, wrong, or dangerous answers.

Google has already stated that it will not rely on llms.txt for its “AI Overviews” feature and continues to follow “normal SEO.” And John Mueller wrote: “FWIW no AI system currently uses llms.txt.” So the tools that could use the manifest are largely staying on the sidelines. This reflects the idea that a root-file standard without established trust is a liability.

Why Adoption Without Governance Fails

Every successful web standard has shared DNA: a governing body, a clear vocabulary, and an enforcement pathway. The standards that survive all answer one question early … “Who owns the rules?”

Schema.org worked because that answer was clear. It began as a coalition between Bing, Google, Yahoo, and Yandex. The collaboration defined a bounded vocabulary, agreed syntax, and a feedback loop with publishers. When abuse emerged (fake reviews, fake product data), those engines coordinated enforcement and refined documentation. The signal endured because it wasn’t owned by a single company or left to self-police.

Robots.txt, in contrast, survived by being minimal. It didn’t try to describe content quality or semantics. It only told crawlers what not to touch. That simplicity reduced its surface area for abuse. It required almost no trust between webmasters and platforms. The worst that could happen was over-blocking your own content; there was no incentive to lie inside the file.

llms.txt lives in the opposite world. It invites publishers to self-declare what matters most and, in its full-text variant, what the “truth” of that content is. There’s no consortium overseeing the format, no standardized schema to validate against, and no enforcement group to vet misuse. Anyone can publish one. Nobody has to respect it. And no major LLM provider today is known to consume it in production. Maybe they are, privately, but publicly, no announcements about adoption.

What Would Need To Change For Trust To Build

To shift from optional neat-idea to actual trusted signal, several conditions must be met, and each of these entails a cost in either dollars or human time, so again, dollars.

  • First, manifest verification. A signature or DNS-based verification could tie an llms.txt file to site ownership, reducing spoof risk. (cost to website)
  • Second, cross-checking. Platforms should validate that URLs listed correspond to live, public pages, and identify mismatch or cloaking via automated checks. (cost to engine/platform)
  • Third, transparency and logging. Public registries of manifests and logs of updates would make dramatic changes visible and allow community auditing. (cost to someone)
  • Fourth, measurement of benefit. Platforms need empirical evidence that ingesting llms.txt leads to meaningful improvements in answer correctness, citation accuracy, or brand representation. Until then, this is speculative. (cost to engine/platform)
  • Finally, abuse deterrence. Mechanisms must be built to detect and penalize spammy or manipulative manifest usage. Without that, spam teams simply assume negative benefit. (cost to engine/platform)

Until those elements are in place, platforms will treat llms.txt as optional at best or irrelevant at worst. So maybe you get a small benefit? Or maybe not…

The Real Value Today

For site owners, llms.txt still may have some value, but not as a guaranteed path to traffic or “AI ranking.” It can function as a content alignment tool, guiding internal teams to identify priority URLs you want AI systems to see. For documentation-heavy sites, internal agent systems, or partner tools that you control, it may make sense to publish a manifest and experiment.

However, if your goal is to influence large public LLM-powered results (such as those by Google, OpenAI, or Perplexity), you should tread cautiously. There is no public evidence those systems honor llms.txt yet. In other words: Treat llms.txt as a “mirror” of your content strategy, not a “magnet” pulling traffic. Of course, this means building the file(s) and maintaining them, so factor in the added work v. whatever return you believe you will receive.

Closing Thoughts

The web keeps trying to teach machines about itself. Each generation invents a new format, a new way to declare “here’s what matters.” And each time the same question decides its fate: “Can this signal be trusted?” With llms.txt, the idea is sound, but the trust mechanisms aren’t yet baked in. Until verification, governance, and empirical proof arrive, llms.txt will reside in the grey zone between promise and problem.

More Resources:


This post was originally published on Duane Forrester Decodes.


Featured Image: Roman Samborskyi/Shutterstock

Category SEO Generative AI
Duane Forrester Founder and CEO at UnboundAnswers.com

Duane Forrester is the Founder and CEO of UnboundAnswers.com, a consultancy helping businesses adapt to the realities of AI-powered search ...