1. SEJ
  2.  ⋅ 
  3. Generative AI

Anthropic’s Claude Bots Make Robots.txt Decisions More Granular

  • Anthropic now lists separate bots for training and search, while also running user-requested fetchers for browsing.
  • Blocking search bots has direct discoverability consequences.
  • Anthropic warns blocking Claude-SearchBot may reduce visibility in search results.

Anthropic updated its crawler documentation to list separate Claude bots for training, search indexing, and user requests, with visibility tradeoffs when blocked.

Anthropic’s Claude Bots Make Robots.txt Decisions More Granular

Anthropic updated its crawler documentation this week with a formal breakdown of its three web crawlers and their individual purposes.

The page now lists ClaudeBot (training data collection), Claude-User (fetching pages when Claude users ask questions), and Claude-SearchBot (indexing content for search results) as separate bots, each with its own robots.txt user-agent string.

Each bot gets a “What happens when you disable it” explanation. For Claude-SearchBot, Anthropic wrote that blocking it “prevents our system from indexing your content for search optimization, which may reduce your site’s visibility and accuracy in user search results.”

For Claude-User, the language is similar. Blocking it “prevents our system from retrieving your content in response to a user query, which may reduce your site’s visibility for user-directed web search.”

The update formalizes a pattern that’s becoming more common among AI search products. OpenAI runs the same three-tier structure with GPTBot, OAI-SearchBot, and ChatGPT-User. Perplexity operates a two-tier version with PerplexityBot for indexing and Perplexity-User for retrieval.

Anthropic says all three of its bots honor robots.txt, including Claude-User. OpenAI and Perplexity draw a sharper line for user-initiated fetchers, warning that robots.txt rules may not apply to ChatGPT-User and generally don’t apply to Perplexity-User. For Anthropic and OpenAI, blocking the training bot does not block the search bot or the user-requested fetcher.

What Changed From The Old Page

The previous version of Anthropic’s crawler page referenced only ClaudeBot and used broader language about data collection for model development. Before ClaudeBot, Anthropic operated under the Claude-Web and Anthropic-AI user agents, both now deprecated.

The move from one listed crawler to three mirrors what OpenAI did in late 2024 when it separated GPTBot from OAI-SearchBot and ChatGPT-User. OpenAI updated that documentation again in December, adding a note that GPTBot and OAI-SearchBot share information to avoid duplicate crawling when both are allowed.

OpenAI also noted in that December update that ChatGPT-User, which handles user-initiated browsing, may not be governed by robots.txt in the same way as its automated crawlers. Anthropic’s documentation does not make a similar distinction for Claude-User.

Why This Matters

The blanket “block AI crawlers” strategy that many sites adopted in 2024 no longer works the way it did. Blocking ClaudeBot stops training data collection but does nothing about Claude-SearchBot or Claude-User. The same is true on OpenAI’s side.

A BuzzStream study we covered in January found that 79% of top news sites block at least one AI training bot. But 71% also block at least one retrieval or search bot, potentially removing themselves from AI-powered search citations in the process.

That matters more now than it did a year ago. Hostinger’s analysis of 66.7 billion bot requests showed OpenAI’s search crawler coverage growing from 4.7% to over 55% of sites in their sample, even as its training crawler coverage dropped from 84% to 12%. Websites are allowing search bots while blocking training bots, and the gap is widening.

The visibility warnings differ by company. Anthropic says blocking Claude-SearchBot “may reduce” visibility. OpenAI is more direct, telling publishers that sites opted out of OAI-SearchBot won’t appear in ChatGPT search answers, though navigational links may still show up. Both are positioning their search crawlers alongside Googlebot and Bingbot, not alongside their own training crawlers.

What This Means

When managing robots.txt files, the old copy-paste block list needs an audit. SEJ’s complete AI crawler list includes verified user-agent strings across every company.

A strategic robots.txt now requires separate entries for training and search bots at minimum, with the understanding that user-initiated fetchers may not follow the same rules.

Looking Ahead

The three-tier split creates a new category of publisher decision that parallels what Google did years ago with Google-Extended. That user-agent lets sites opt out of Gemini training while staying in Google Search results. Now Anthropic and OpenAI offer the same separation for their platforms.

As AI-powered search grows its share of referral traffic, the cost of blocking search crawlers increases. The Cloudflare Year in Review data we reported in December showed AI crawlers already account for a measurable share of web traffic, and the gap between crawling volume and referral traffic remains wide. How publishers navigate these three-way decisions will shape how much of the web AI search tools can actually surface.

Category News Generative AI
SEJ STAFF Matt G. Southern Senior News Writer at Search Engine Journal

Matt G. Southern, Senior News Writer, has been with Search Engine Journal since 2013. With a bachelor’s degree in communications, ...