The Technical SEO Audit Needs A New Layer

Websites need a new audit framework that accounts for AI crawlers, rendering limitations, structured data, and accessibility tree parsing.

Slobodan Manic

April 27, 2026
⋅
15 min read

Slobodan Manic Host of the No Hacks Podcast and machine-first web optimization consultant at No Hacks

Bio

5.2K

READS

The Technical SEO Audit Needs A New Layer

The standard technical SEO audit checks crawlability, indexability, website speed, mobile-friendliness, and structured data. That checklist was designed for one consumer: Googlebot.

This is how it’s always been.

In 2026, your website has, at least, a dozen additional non-human consumers. AI crawlers like GPTBot, ClaudeBot, and PerplexityBot train models and power AI search results. User-triggered agents like the newly announced Google-Agent, or its “siblings” Claude-User and ChatGPT-User, browse websites on behalf of specific humans in real time. A Q1 2026 analysis across Cloudflare’s network found that 30.6% of all web traffic now comes from now bots, with AI crawlers and agents making up a growing share. Your technical audit needs to account for all of them.

Here are the five layers to add to your existing technical SEO audit.

Layer 1: AI Crawler Access

Your robots.txt was probably written for Googlebot, Bingbot, and maybe a few scrapers. AI crawlers need their own robots.txt rules, and they need to be separate from Googlebot and Bingbot.

What To Check

Review your robots.txt for rules targeting AI-specific user agents: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, AppleBot-Extended, CCBot, and ChatGPT-User. If none of these appear, you’re running on defaults, and those defaults might not reflect what you actually want. Never accept the defaults unless you know they are exactly what you need.

The key is making a conscious decision per crawler rather than blanket allowing or blocking everything. Not all AI crawlers serve the same purpose. AI crawler traffic can be split into three categories: training crawlers that collect data for model training (89.4% of AI crawler traffic according to Cloudflare data), search crawlers that power AI search results (8%), and user-triggered agents like Google-Agent and ChatGPT-User that browse on behalf of a specific human in real time (2.2%). Each category warrants a different robots.txt decision.

Chart showing traffic volume by crawler purpose - Cloudflare Radar Q1 2026 — Cloudflare Radar data showing traffic volume by crawl purpose (Q1 2026); Screenshot by author, April 2026

The crawl-to-referral ratios from Cloudflare’s Radar report can make this an informed decision for you. Anthropic’s ClaudeBot crawls 20.6 thousand pages for every single referral it returns. OpenAI’s ratio is 1,300:1. Meta sends no referrals. Blocking OpenAI’s OAI-SearchBot or PerplexityBot reduces your visibility in ChatGPT Search and Perplexity’s AI answers. Blocking training-focused crawlers like CCBot or Meta’s crawler prevents data extraction from a provider that sends zero traffic back. The crawl-to-referral ratios tell you who is taking without giving.

There is one crawler that requires special attention. Google added Google-Agent to its official list of user-triggered fetchers on March 20, 2026. Google-Agent identifies requests from AI systems running on Google infrastructure that browse websites on behalf of users. Unlike traditional crawlers, Google-Agent ignores robots.txt. Google’s position is that since a human initiated the request, the agent acts as a user proxy rather than an autonomous crawler. Blocking Google-Agent requires server-side authentication, not robots.txt rules. This is both interesting, and important for the future, even if it’s not within the scope of this article.

Official documentation for each crawler:

GPTBot/ChatGPT-User (OpenAI).
ClaudeBot (Anthropic).
PerplexityBot (Perplexity).
Google-Agent (Google).

Layer 2: JavaScript Rendering

Googlebot renders JavaScript using headless Chromium. There is nothing new about that. What is new and different is that virtually every major AI crawler does not render JavaScript.

Crawler	Renders JavaScript
GPTBot (OpenAI)	No
ClaudeBot (Anthropic)	No
PerplexityBot	No
CCBot (Common Crawl)	No
AppleBot	Yes
Googlebot	Yes

AppleBot (which uses a WebKit-based renderer) and Googlebot are the only major crawlers that render JavaScript. Four of the six major web crawlers (GPTBot, ClaudeBot, PerplexityBot, and CCBot) fetch static HTML only, making server-side rendering a requirement for AI search visibility, not an optimization. If your content lives in client-side JavaScript, it is invisible to the crawlers training OpenAI, Anthropic, and Perplexity’s models and powering their AI search products.

What To Check

Run curl -s [URL] on your critical pages and search the output for key content like product names, prices, or service descriptions. If that content isn’t in the curl response, GPTBot, ClaudeBot, and PerplexityBot can’t see it either. Alternatively, use View Source in your browser (not Inspect Element, which shows the rendered DOM after JavaScript execution) and check whether the important information is present in the raw HTML.

CURL fetch of No Hacks homepage — Curl fetch of No Hacks homepage (Image from author, April 2026)

Single-page applications (SPAs) built with React, Vue, or Angular are particularly at risk unless they use server-side rendering (SSR) or static site generation (SSG). A React SPA that renders product descriptions, pricing, or key claims entirely on the client side is sending AI crawlers a blank page with a link to the JavaScript bundle.

The fix isn’t complicated. Server-side rendering (SSR), static site generation (SSG), or pre-rendering solves this for every major framework. Next.js supports SSR and SSG natively for React, Nuxt provides the same for Vue, and Angular Universal handles server rendering for Angular applications. The audit just needs to flag which pages depend on client-side JavaScript for critical content.

Layer 3: Structured Data For AI

Structured data has been part of technical SEO audits for years, but the evaluation criteria need updating. The question is no longer just “does this page have schema markup?” It’s “does this markup help AI systems understand and cite this content?”

What To Check

JSON-LD implementation (preferred over Microdata and RDFa for AI parsing).
Schema types that go beyond the basics: Organization, Article, Product, FAQ, HowTo, Person.
Entity relationships: sameAs, author, publisher connections that link your content to known entities.
Completeness: are all relevant properties populated, or are you just checking a box using skeleton schemas with name and URL?

Why This Matters Now

Microsoft’s Bing principal product manager Fabrice Canel confirmed in March 2025 that schema markup helps LLMs understand content for Copilot. The Google Search team stated in April 2025 that structured data gives an advantage in search results.

No, you can’t win with schema alone. Yes, it can help.

The data density angle matters too. The GEO research paper by Princeton, Georgia Tech, the Allen Institute for AI, and IIT Delhi (presented at ACM KDD 2024, first to publicly use the term “GEO”) found that adding statistics to content improved AI visibility by 41%. Yext’s analysis found that data-rich websites earn 4.3x more AI citations than directory-style listings. Structured data contributes to data density by giving AI systems machine-readable facts rather than requiring them to extract meaning from prose.

An important caveat: No peer-reviewed academic studies exist yet on schema’s impact on AI citation rates specifically. The industry data is promising and consistent, but treat these numbers as indicators rather than guarantees.

W3Techs reports that approximately 53% of the top 10 million websites use JSON-LD as of early 2026. If your website isn’t among them, you’re missing signals that both traditional and AI search systems use to understand your content.

Duane Forrester, who helped build Bing Webmaster Tools and co-launched Schema.org, argues that schema markup is only step one. As AI agents continue moving from simply interpreting pages to making decisions, brands will also need to publish operational truth (pricing, policies, constraints) in machine-verifiable formats with versioning and cryptographic signatures. Publishing machine-verifiable source packs is beyond the scope of a standard audit today, but auditing structured data completeness and accuracy is the foundation verified source packs build on.

Layer 4: Semantic HTML And The Accessibility Tree

The first three layers of the AI-readiness audit cover crawler access (robots.txt), JavaScript rendering, and structured data. The final two address how AI agents actually read your pages and what signals help them discover and evaluate your content.

Most SEOs evaluate HTML for search engine consumption. Agentic browsers like ChatGPT Atlas, Chrome with auto browse, and Perplexity Comet don’t parse pages the way Googlebot does. They read the accessibility tree instead.

The accessibility tree is a parallel representation of your page that browsers generate from your HTML. It strips away visual styling, layout, and decoration, keeping only the semantic structure: headings, links, buttons, form fields, labels, and the relationships between them. Screen readers like VoiceOver and NVDA have used the accessibility tree for decades to make websites usable for people with visual impairments. AI agents now use the same tree to understand and interact with web pages.

And the reason is simple: efficiency. Processing screenshots is both more expensive and slower than working with the accessibility tree.

Accessibility tree shown in Google Chrome — This is what an accessibility tree looks like in Google Chrome (Image from author, April 2026)

This matters because the accessibility tree exposes what your HTML actually communicates, not what your CSS (or JS) makes it look like. A <div> styled to look like a button doesn’t appear as a button in the accessibility tree. An image without alt text means nothing. A heading hierarchy that skips from H1 to H4 creates a broken structure that both screen readers and AI agents will struggle to navigate.

Microsoft’s Playwright MCP, the standard tool for connecting AI models to browser automation, uses accessibility snapshots rather than raw HTML or screenshots. Playwright MCP’s browser_snapshot function returns an accessibility tree representation because it’s more compact and semantically meaningful for LLMs. OpenAI’s documentation states that ChatGPT Atlas uses ARIA tags to interpret page structure when browsing websites.

Web accessibility and AI agent compatibility are now the same discipline. Proper heading hierarchy (H1-H6) creates meaningful sections that AI systems use for content extraction. Semantic elements like <nav>, <main>, <article>, and <section> tell machines what role each content block plays. Form labels and descriptive button text make interactive elements understandable to agents that parse the accessibility tree instead of rendering visual design.

What To Check

Heading hierarchy: logical H1-H6 structure that machines can use to understand content relationships.
Semantic elements: nav, main, article, section, aside, header, footer, used appropriately.
Form inputs: every input has a label, every button has descriptive text.
Interactive elements: clickable things use <button> or <a>, not <div onclick>.
Accessibility tree: run a Playwright MCP snapshot or test with VoiceOver/NVDA to see what agents actually see.

Somehow, things are getting worse on this front. The WebAIM Million 2026 report found that the average web page now has 56.1 accessibility errors, up 10.1% from 2025.

ARIA (Accessible Rich Internet Applications) usage increased 27% in a single year. ARIA is a set of HTML attributes that add extra semantic information to elements, telling screen readers and AI agents things like “this div is actually a dialog” or “this list functions as a menu.” But what’s critical is this: pages with ARIA present had significantly more errors (59.1 on average) than pages without ARIA (42 on average). Adding ARIA without understanding it makes things worse, not better, because incorrect ARIA overrides the browser’s default accessibility tree interpretation with wrong information. Start with proper semantic HTML. Add ARIA only when native elements aren’t sufficient.

Technical SEOs do not need to become accessibility experts. But treating accessibility as someone else’s problem is no longer viable when the same tree that screen readers parse is now the primary interface between AI agents and your website.

Sidenote: The Markdown Shortcut Doesn’t Work

Serving raw markdown files to AI crawlers instead of HTML can result in a 95% reduction in token usage per page. However, Google Search Advocate John Mueller called this “a stupid idea” in February 2026 on Bluesky. Mueller’s argument was this: “Meaning lives in structure, hierarchy and context. Flatten it and you don’t make it machine-friendly, you make it meaningless.” LLMs were trained on normal HTML pages from the beginning and have no problems processing them. The answer isn’t to create a flat, simplified version for machines. It’s to make the HTML itself properly structured. Well-written semantic HTML already is the machine-readable format. Besides, that simplified version already exists in the accessibility tree, and it is what AI agents already use.

Layer 5: AI Discoverability Signals

The final layer covers signals that don’t fit neatly into traditional audit categories but directly affect how AI systems discover and evaluate your website.

llms.txt (dis-honourable mention). Listed first for one reason only, ask any LLM what you should do to make your website more visible to AI systems, and llms.txt will be at or near the top of the list. It’s their world, I guess. The llms.txt specification provides a simple markdown file that helps AI agents understand your website’s purpose, structure, and key content. No large-scale adoption data has been published yet, and its actual impact on AI citations is unproven. But LLMs consistently recommend it, which means AI-powered audit tools and consultants will flag its absence. It takes minutes to create and costs nothing to maintain.

OK, now that we’ve got that out of the way, let’s look at what might really matter.

AI crawler analytics. Are you monitoring AI bot traffic? Cloudflare’s AI Audit dashboard shows which AI crawlers visit, how often, and which pages they hit. If you’re not on Cloudflare, check server logs for Google-Agent, ChatGPT-User, and ClaudeBot user agent strings. Google publishes a user-triggered-agents.json file containing IP ranges that Google-Agent uses, so you can verify whether incoming requests are genuinely from Google rather than spoofed user agent strings.

Entity definition. Does your website clearly define what the business is, who runs it, and what it does? Not in marketing copy, but in structured, machine-parseable markup. Organization schema should include name, URL, logo, founding date, and sameAs links to verified profiles on LinkedIn, Crunchbase, and Wikipedia. Person schema for key people should connect them to the organization via author and employee properties. AI systems need to resolve your identity as a distinct entity before they can confidently recommend you over competitors with similar names or offerings. Don’t slap this on top of your website when your designer is done with their work. Start here; it will make your life easier.

Content position. Where you place information on the page directly affects whether AI systems cite it. Kevin Indig’s analysis of 98,000 ChatGPT citation rows across 1.2 million responses found that 44.2% of all AI citations come from the top 30% of a page. The bottom 10% earns only 2.4-4.4% of citations regardless of industry. Duane Forrester calls this “dog-bone thinking”: strong at the beginning and end, weak in the middle, a pattern Stanford researchers have confirmed as the “lost in the middle” phenomenon. Audit your key pages: are the most important claims and data points in the first 30%, or buried in the middle?

Content extractability. Pull any key claim from your page and read it in isolation. Does it still make sense without the surrounding paragraphs? AI retrieval systems, like ChatGPT, Perplexity, and Google AI Overviews, extract and cite individual passages and sentences that rely on “this,” “it,” or “the above” for meaning, become unusable when extracted from their original context. Ramon Eijkemans’ excellent utility-writing framework maps these principles to documented retrieval mechanisms: self-contained sentences, explicit entity relationships, and quotable anchor statements that AI systems can confidently cite without additional inference.

The Audit Checklist

Check	Tool/Method	What You’re Looking For
AI crawler robots.txt	Manual review	Conscious per-crawler decisions
JavaScript rendering	curl, View Source, Lynx browser	Critical content in static HTML
Structured data	Schema validator, Rich Results Test	Complete, connected JSON-LD
Semantic HTML	axe DevTools, Lighthouse	Proper elements, heading hierarchy
Accessibility tree	Playwright MCP snapshot, screen reader	What agents actually see
AI bot traffic	Cloudflare, server logs	Volume, pages hit, patterns

From Audit To Action

This audit identifies gaps. Fixing them requires a sequence, because some fixes depend on others. Optimizing content structure before establishing a machine-readable identity means agents can extract your information, but can’t confidently attribute it to your brand. I wrote Machine-First Architecture to provide that sequence: identity, structure, content, interaction, each pillar building on the previous one.

Why Technical SEO Audit Is Where This Belongs

None of this is technically SEO. Robots.txt rules for AI crawlers don’t affect Google rankings. Accessibility tree optimization doesn’t move keyword positions. Content position scoring has nothing to do with search indexing.

But most of it did grow out of technical SEO. Crawl management, structured data, semantic HTML, JavaScript rendering, server log analysis: these are skills technical SEOs already have. The audit methodology transfers directly. The consumer it serves is what changed.

The websites that get cited in AI responses, that work when Chrome auto browse visits them, that show up when someone asks ChatGPT for a recommendation, they won’t be the ones with the best content alone. They’ll be the ones whose technical foundation made that content accessible to machines. Technical SEOs are the people best equipped to build that foundation. The old audit template just needs a new section to reflect it.

More Resources:

Featured Image: Anton Vierietin/Shutterstock

Category SEO Technical SEO

AI Agents For SEO: 5 Lessons From Production

Local Google Visibility Guide + Cheat Sheet

Inside AI Max, PMax & Smart Bidding

Why AI Content Stopped Working & What To Do About It

Win AI Citations Across Every Location

Prove What's Moving Your AI Search Results

The Technical SEO Audit Needs A New Layer

Layer 1: AI Crawler Access

What To Check

Layer 2: JavaScript Rendering

What To Check

Layer 3: Structured Data For AI

What To Check

Why This Matters Now

Layer 4: Semantic HTML And The Accessibility Tree

What To Check

Sidenote: The Markdown Shortcut Doesn’t Work

Layer 5: AI Discoverability Signals

The Audit Checklist

From Audit To Action

Why Technical SEO Audit Is Where This Belongs