AI Search in 2026: The 5 Article GEO & SEO Playbook For Modern Visibility

AI Search in 2026: The 5 Article GEO & SEO Playbook For Modern Visibility

3 Essential GEO Strategy Playbooks To Help Marketers Adapt to Generative Search

contentful contentful
3 Essential GEO Strategy Playbooks To Help Marketers Adapt to Generative Search

Top-of-funnel discovery is undergoing its fastest transformation in decades. Generative AI tools, like ChatGPT, Gemini, Perplexity, and AI-powered search, are reshaping how people find products and brands. Instead of competing for the top organic link, brands now need to appear within generative answers, where LLMs synthesize information across the web.

This shift, known as generative engine optimization (GEO), rewards originality, clarity, and verifiable expertise over content volume. Brands that break through invest in primary insights, present a clear point of view, and maintain message consistency across product documentation, help content, and thought leadership.

But to reap those rewards, marketers must craft content that is machine-readable, machine-verifiable, and machine-citable, and resonates with their target audience.

Below are three playbooks to help navigate this new reality.

1. Content Playbook: Create Information AI Can Parse, Contextualize, and Trust

Generative models depend on knowledge graphs, not keyword matching. Knowledge graphs identify entities — people, products, organizations, concepts — and map relationships among them to provide contextually accurate responses. The clearer your content makes these entities and relationships, the more likely your brand will surface in AI-generated responses.

Make content easy for AI to interpret

This means leaning into clarity and structure:

  • Use natural language.
  • Structure headings and use a logical information hierarchy.
  • Reinforce key concepts with lists and Q&A formats
  • Write short, well-organized paragraphs.

These tactics support both SEO and GEO, helping AI systems extract meaning and understand how concepts are related with less ambiguity.

Demonstrate real expertise

AI engines are more likely to cite sources with firsthand knowledge, original data, or expert commentary. Content grounded in research or practitioner insight signals credibility to both machine and human readers.

Build interconnected topical depth

Instead of isolated articles, create clusters of interlinked content around the themes your brand wants to own. Providing more complete coverage and connecting related content reinforces your topical authority and improves how generative engines map your entity across related topics.

Use AI responses as research inputs

Track and review AI responses to queries that matter to your brand. Look for gaps and queries where your brand is missing from the conversation. This is your roadmap for new content. It’s like a modern, more revealing version of “People also ask.”

2. Technical Playbook: Ensure AI Can Access and Interpret Your Content Correctly

In an AI-driven world, you need to put in the extra effort to ensure your content is visible to AI bots. If AI bots can’t crawl or render your digital experiences, they can’t surface them in answers. GEO requires renewed focus on accessibility, performance, and structure.

Give AI crawlers clear access

Many organizations accidentally block AI bots like ChatGPTBot or PerplexityBot in their robots.txt file. Validating access is essential for generative visibility.

Account for limited JavaScript rendering

Most AI bots cannot render JavaScript, meaning they can’t see your content. Server-side rendering or prerendering ensures your content is visible to people and AI bots.

Use structured data generously

Structured data, or schema markup, helps AI bots understand your content and improves your chances of appearing in knowledge graphs and entity associations.

Optimize multimedia with metadata

Because AI still struggles with rich media comprehension, video transcripts, alt text for images, and descriptive metadata remain essential. Teams that find these tasks tedious can use AI tools to generate alt tags and metadata.

Maintain site speed and  performance

People and bots want fast, responsive pages. Use Google’s Core Web Vitals as your benchmark for high-speed digital experiences.

3. Authority Playbook: Strengthen the Signals That Models Use to Decide Who to Trust

Generative engines synthesize information from the entire web, not just your site. That means you need to build credibility and authority beyond your own digital channels. Earning brand mentions across multiple trusted sources strengthens your authority and visibility.

Earn citations with high-quality content

Content that highlights your expertise and contributes something new is more likely to earn citations and the attention of AI bots.

Understand which sources influence the models

One of the advantages of generative engines is that you can ask them where they get their information. This transparency helps you identify which publications, forums, and communities you should prioritize in your placement efforts.

Maintain consistent messaging across all channels

Inconsistent messaging can create confusion for AI. Use a shared messaging and positioning framework (MPF) across your website, documentation, social media, and third-party profiles.

Maintain authoritative owned media profiles

Properties such as Wikipedia, Wikidata, and industry directories reinforce your brand’s legitimacy and strengthen your presence in AI’s knowledge graph. Review and update your media profiles regularly to ensure consistent messaging.

Build for Humans and Machines

Marketers today must protect top-of-funnel discovery by creating content that works for SEO, GEO, and humans.

This requires creativity and a new level of technical acumen. Successful marketers use structured content, consistent messaging, and an understanding of how generative AI works to craft content that connects with users wherever they are.

See how structured content and AI tools are helping teams implement and automate SEO/GEO for greater visibility on contentful.com.

Sponsored
Search behavior is shifting fast. To reach people and machines, you need a new playbook: Learn how to pivot from SEO to GEO.
Explore Contentful’s GEO Hub

AI-Powered Search: Adapting Your SEO Strategy

Ranking today takes more than keywords. Here's how to create AI-friendly, user-focused content that keeps your brand visible in today’s evolving search ecosystem.

Winston Burton Winston Burton 7.8K Reads
AI-Powered Search: Adapting Your SEO Strategy

Using outdated and traditional SEO tactics centered around keywords and backlinks is not moving the needle anymore, as Google shifts toward AI-driven answers.

Organic clicks are going down as a result of Google AI Overviews and the rise of zero-click searches, where users get answers directly on Google’s search results page without clicking through to any websites.

SEO is not dead; it just evolved. To succeed in this new era, brands and marketers need to embrace structured, intent-driven content, enhancing trustworthiness, and aligning more closely with brand and user experience strategies.

But, how do you do this? Let’s explore.

Omnichannel  Strategy

In today’s search landscape, you need to be everywhere where your audience is, including platforms like Reddit, Quora, TikTok, YouTube, and anywhere else relevant.

If your brand is not present on those discovery channels, you need to get on them as soon as possible.

User behavior has changed, and people are using different search engines and channels to find information, products, reviews, and more.

For example:

  • Reddit for the opinions of other users who have used a similar product or service.
  • TikTok and YouTube for tutorials and product reviews.
  • Instagram for discovery.
  • Amazon and Pinterest for product reviews and inspiration.

If your marketing strategy focuses only on Google, you’re ignoring a large portion of your target audience who are looking for your content, which is basically invisible to them.

Optimize For AI Overviews

To rank for AIO, brands and marketers must focus on creating high-quality, authoritative content that directly answers user questions, is well-structured, and easy for AI to understand.

Tactic Overview Best Practices 
Create high-quality conversational content. Conduct research and see what queries show for AIOs. Create original and unique content that meets user intent and answers users’ questions. Update and modify existing content for AI Overviews by answering questions, making it conversational, and using quotes, testimonials, and updated headings.
Use plain headings and short paragraphs. This helps improve content readability for both users and AI. Use clear headings, concise paragraphs, and natural language to enhance AI understanding.
Mark up content with structured data. This helps AI and traditional search engines understand your content better. Use schema markup (e.g., FAQ Page, HowTo, Product).
Let AI bots in. This will help your content get crawled and cited by AI systems. Use llm.txt and check your robots.txt file to allow bots like OpenAI’s GPTBot and Google AI. Make sure they are not blocked.
Earn mentions on trustworthy sites. This will help improve your authority and increase your brand visibility. Create high-quality content with unique information, contribute guest posts, stay active on social media, appear on podcasts, use internal linking, and implement PR strategies.
Keep content fresh. AI chatbots love up-to-date information. Regularly update content with new data, statistics, and unique and valuable information for end-users.
Track brand mentions. Brands that are mentioned frequently across various platforms (i.e., PR, blogs, social media, news coverage, YouTube forums like Reddit and Quora, and authoritative sites) tend to be mentioned by AI. Use tools like Google Search Console, Brand24, and Mention.com to monitor online conversations.

Focus On Branded Searches

Branded searches play a vital role in shaping brand perception, driving engagement, and ensuring your brand’s visibility and authority for large language models (LLMs).

LLMs do not work like traditional search engines. They look at user intent, context, and conversational relevance.

To elevate your brand presence:

  • Your information must be accurate and consistent across all platforms.
  • Your content should be useful and helpful to your target audience.
  • It should showcase expertise through thought leadership, offering original, unique, and data-backed insights quoted across authoritative sites and forums.
  • Your brand needs a strong reputation.

Adapt your approach by tracking brand mentions using tools like Brand24 and Semrush, and analyzing LLM-driven traffic via Google Analytics 4, while also testing brand visibility across different platforms and devices.

Focus On User Intent And Topic Clusters

We have moved from keywords to relevance.

Optimize for entire topics and users’ needs by creating in-depth content that covers all aspects of a subject and anticipates users’ questions.

Use long-tail keywords and natural language to cover users’ intent. Tools like MarketMuse by Siteimprove do a wonderful job at this.

AI-Powered Content Creation And Optimization

Google is not against AI-generated content. Google does not like low-quality AI content.

Before LLMs came on the stage, content creation often involved manual writing and optimization that took a long time to draft a high-quality article focusing on E-E-A-T (or expertise, experience, authoritativeness, and trustworthiness).

Now, you can automate content briefs, generate outlines, create content drafts, edit them, and make them your own. A lot of AI tools and platforms incorporate ChatGPT into their services and add in so-called proprietary algorithms on top of it.

With AI, you can create content at a much faster pace and optimize existing content easier than before, but I don’t recommend it for content creation.

Here’s why.

There is a lot of content out there that repeats and rinses what everyone else does. Your content must be different and educate your audience, convince them that you’re the subject matter expert on a topic, and gain trust and solve their problems.

This kind of content is best written by humans with AI assistance to enhance quality, make it more engaging, and encourage people to share it.

Wrapping Up

SEO is changing fast into a conversational experience powered by AI.

The tactics and strategies that once worked won’t cut it anymore in today’s AI-powered results.

But, this is not the end of SEO. Rather, it’s an SEO evolution as Google continues its mission to organize the world’s information and make it universally accessible and useful.

To win in this new era, brands and marketers must shift from chasing rankings to building visibility, trust, and relevance across multiple platforms – whether that’s on Google, TikTok, Reddit, or Quora.

It’s also important to deliver content that’s useful and up-to-date, solving users’ problems and helping them during their journey.

AI is here to assist, not replace. AI can handle the heavy lifting, but don’t hand over the steering wheel.

Brands that will thrive have a human touch in providing value, showing expertise, and genuinely connecting with their audience.

Don’t forget who you’re really creating content for, which is people.

More Resources:


Featured Image: Collagery/Shutterstock

How To Get Your Content (& Brand) Recommended By AI & LLMs

Want your content and brand cited in AI results? Focus on substance, not shortcuts. Here's the strategy that works.

Andreas Voniatis Andreas Voniatis 22K Reads
How To Get Your Content (& Brand) Recommended By AI & LLMs

The game has changed, and quite recently, too.

Generative engine optimization (GEO), AI Overviews (AIOs), or just an extension of SEO (now being dubbed on LinkedIn as Search Everywhere Optimization) – which acronym is correct?

I’d argue it’s GEO, as you’ll see why. And if you’ve ever built your own large language model from scratch like I did in 2020, you’ll know why.

We’ve all seen various frightening (for some) data on how click-through rates have now dropped off the cliff with Google AIOs, how LLMs like ChatGPT are eroding Google’s share of search – basically “SEO is dead” – so I won’t repeat them here.

What I will cover are first principles to get your content (along with your company) recommended by AI and LLMs alike.

Everything I disclose here is based on real-world experiences of AI search successes achieved with clients.

Using an example I can talk about, I’ll go with Boundless as seen below.

Screenshot by author, July 2025

Tell The World Something New

Imagine the dread a PR agency might feel if it signed up a new business client only to find they haven’t got anything newsworthy to promote to the media – a tough sell. Traditional SEO content is a bit like that.

We’ve all seen and done the rather tired ultimate content guide to [insert your target topic] playbooks, which attempt to turn your website into the Wikipedia (a key data source for ChatGPT, it seems) of whatever industry you happen to be in.

And let’s face it, it worked so well, it ruined the internet, according to The Verge.

The fundamental problem with that type of SEO content is that it has no information gain. When trillions of webpages all follow the same “best practice” playbook, they’re not telling the world anything genuinely new.

You only have to look at the Information Gain patent by Google to underscore the importance of content possessing value, i.e., your content must tell the world (via the internet) something new.

BoundlessHQ commissioned a survey on remote work, asking ‘Ideally, where would you like to work from if it were your choice?’

The results provided a set of data and this kind of content is high effort, unique, and value-adding enough to get cited in AI search results.

Of course, it shouldn’t take AI to produce this kind of content in the first place, as that would be good SEO content marketing in any case. AI has simply forced our hand (more on that later).

After all, if your content isn’t unique, why would journalists mention you? Bloggers link back to you? People share or bookmark your page? AI retrain its models using your content or cite your brand?

You get the idea.

For improved AI visibility, include your data sources and research methods with their limitations, as this level of transparency makes your content more verifiable to AI.

Also, updating your data more regularly than annually will indicate reliability to AI as a trusted information source for citation. What LLM doesn’t want more recent data?

SEO May Not Be Dead, But Keywords Definitely Are

Keywords don’t tell you who’s actually searching. They just tell you what terms trigger ads in Google.

Your content could be appealing to students, retirees, or anyone. That’s not targeting; that’s one size fits all. And in the AI age, one size definitely doesn’t fit all.

So, kiss goodbye to content guides written in one form of English, which win traffic across all English-speaking regions.

AI has created more jobs for marketers, so to win the same traffic as before, you’ll need to create the same content as before for those English-speaking regions.

Keyword tools also allegedly tell you the search volumes your keywords are getting (if you still want them, we don’t).

So, if you’re planning your content strategy on keyword research, stop. You’re optimizing for the wrong search engine.

What you can do instead is a robust market research based on the raw data sources used by LLMs (not the LLM outputs themselves). For example, Grok uses X (Twitter), ChatGPT has publishing partnerships, and so on.

The discussions are the real topics to place your content strategy around, and their volume is the real content demand.

AI Inputs, Not AI Outputs

I’m seeing some discussions (recommendations even) that creating data-driven or research-based content works for getting AI recommendations.

Given the dearth of true data-driven content that AI craves, enjoy it while it lasts, as that will only work in the short term.

AI has raised the content bar, meaning people are specific in their search patterns, such is their confidence in the technology.

Therefore, content marketers will rise to the challenge to produce more targeted, substantial content.

But, even if you are using LLMs in “deep” mode on a premium subscription to inject more substance and value into your content, that simply won’t make the AI’s quality cut.

Expecting such fanciful results is like asking AI to rehydrate itself using its sweat.

The results of AI are derivative, diluted, and hallucinatory by nature. The hallucinatory nature is one of the reasons why I don’t fear LLMs leading to artificial general intelligence (AGI), but that’s another conversation.

Because of the value degradation of the results, AI will not want to risk degrading its models on content founded on AI outputs for fear of becoming dumber.

To create content that AI prefers, you need to be using the same data sources that feed AI engines. It’s long been known that Google started its LLM project over a decade ago when it started training its models on Google Books and other literature.

While most of us won’t have the budget for an X.com data firehose, you can still find creative ways (like we have), such as taking out surveys with robust sample sizes.

Some meaningful press coverage, media mentions, and good backlinks will be significant enough to shift AI into seeing the value of your content, being judged good enough to retrain its models and update its worldview.

And by data-mining the same data sources, you can start structuring content as direct answers to questions.

You’ll also find your content is written to be more conversational to match the search patterns used by your target buyers when they prompt for solutions.

SEO Basics Still Matter

GEO and SEO are not the same. The reverse engineering of search engine results pages to direct content strategy and formulation was effective because rank position is a regression problem.

In AI, there is no rank; there are only winners and losers.

However, there are some heavy overlaps that won’t go away and are even more critical than ever.

Unlike SEO, where more word count was generally more, AI faces the additional constraints of rising energy costs and shortages of computer chips.

That means content needs to be even more efficient than it is for search engines for AI to break down and parse meaning before it can determine its value.

So, by all means:

  • Code pages for faster loading and quicker processing.
  • Deploy schema for adding context to the content.
  • Build a conversational answer-first content architecture.
  • Use HTML anchor jump links to different sections of your content.
  • Open your content to LLM crawling and use llms.txt file.
  • Provide programmatic content access, RSS feeds, or other.

These practices are more points of hygiene to help make your content more discoverable. They may not be a game changer for getting your organization cited by AI, but if you can crush GEO, you’ll crush SEO.

Human, Not AI-Written

AI engines don’t cite boring rehashes. They’re too busy doing that job for us and instead cite sources for their rehash instead.

Now, I have heard arguments say that if the quality of the content (let’s assume it even includes information gain) is on point, then AI shouldn’t care whether it was written by AI or a human.

I’d argue otherwise. Because the last thing any LLM creator wants is their LLM to be retrained on content generated by AI.

While it’s unlikely that generative outputs are tagged in any way, it’s pretty obvious to humans when content is AI-written, and it’s also pretty obvious statistically to AI engines, too.

LLMs will have certain tropes that are common to AI-generated writing, like “The future of … “.

LLMs won’t default to generating lived personal experiences or spontaneously generating subtle humour without heavy creative prompting.

So, don’t do it. Keep your content written by humans.

The Future Is A New Targeted Substantial Value

Getting your content and your company recommended by AI means it needs to tell the world something new.

Make sure it offers information gain based on substantive, non-LLM-derived research (enough to make it worthy of LLM model inclusion), nailing the SEO basics, and keeping it human-written.

The question now becomes, “What can you do to produce high-effort content good enough for AI without costing the earth?”

More Resources:


Featured Image: Collagery/Shutterstock

Sponsored
Search behavior is shifting fast. To reach people and machines, you need a new playbook: Learn how to pivot from SEO to GEO.
Explore Contentful’s GEO Hub

How LLMs Interpret Content: How To Structure Information For AI Search

LLMs don’t need schema; they need structure. Learn how to format your content for visibility in AI Overviews, ChatGPT, and Perplexity.

Carolyn Shelby Carolyn Shelby 25K Reads
How LLMs Interpret Content: How To Structure Information For AI Search

In the SEO world, when we talk about how to structure content for AI search, we often default to structured data – Schema.org, JSON-LD, rich results, knowledge graph eligibility – the whole shooting match.

While that layer of markup is still useful in many scenarios, this isn’t another article about how to wrap your content in tags.

Structuring content isn’t the same as structured data

Instead, we’re going deeper into something more fundamental and arguably more important in the age of generative AI: How your content is actually structured on the page and how that influences what large language models (LLMs) extract, understand, and surface in AI-powered search results.

Structured data is optional. Structured writing and formatting are not.

If you want your content to show up in AI Overviews, Perplexity summaries, ChatGPT citations, or any of the increasingly common “direct answer” features driven by LLMs, the architecture of your content matters: Headings. Paragraphs. Lists. Order. Clarity. Consistency.

In this article, I’m unpacking how LLMs interpret content — and what you can do to make sure your message is not just crawled, but understood.

How LLMs Actually Interpret Web Content

Let’s start with the basics.

Unlike traditional search engine crawlers that rely heavily on markup, metadata, and link structures, LLMs interpret content differently.

They don’t scan a page the way a bot does. They ingest it, break it into tokens, and analyze the relationships between words, sentences, and concepts using attention mechanisms.

They’re not looking for a <meta> tag or a JSON-LD snippet to tell them what a page is about. They’re looking for semantic clarity: Does this content express a clear idea? Is it coherent? Does it answer a question directly?

LLMs like GPT-4 or Gemini analyze:

  • The order in which information is presented.
  • The hierarchy of concepts (which is why headings still matter).
  • Formatting cues like bullet points, tables, bolded summaries.
  • Redundancy and reinforcement, which help models determine what’s most important.

This is why poorly structured content – even if it’s keyword-rich and marked up with schema – can fail to show up in AI summaries, while a clear, well-formatted blog post without a single line of JSON-LD might get cited or paraphrased directly.

Why Structure Matters More Than Ever In AI Search

Traditional search was about ranking; AI search is about representation.

When a language model generates a response to a query, it’s pulling from many sources – often sentence by sentence, paragraph by paragraph.

It’s not retrieving a whole page and showing it. It’s building a new answer based on what it can understand.

What gets understood most reliably?

Content that is:

  • Segmented logically, so each part expresses one idea.
  • Consistent in tone and terminology.
  • Presented in a format that lends itself to quick parsing (think FAQs, how-to steps, definition-style intros).
  • Written with clarity, not cleverness.

AI search engines don’t need schema to pull a step-by-step answer from a blog post.

But, they do need you to label your steps clearly, keep them together, and not bury them in long-winded prose or interrupt them with calls to action, pop-ups, or unrelated tangents.

Clean structure is now a ranking factor – not in the traditional SEO sense, but in the AI citation economy we’re entering.

What LLMs Look For When Parsing Content

Here’s what I’ve observed (both anecdotally and through testing across tools like Perplexity, ChatGPT Browse, Bing Copilot, and Google’s AI Overviews):

  • Clear Headings And Subheadings: LLMs use heading structure to understand hierarchy. Pages with proper H1–H2–H3 nesting are easier to parse than walls of text or div-heavy templates.
  • Short, Focused Paragraphs: Long paragraphs bury the lede. LLMs favor self-contained thoughts. Think one idea per paragraph.
  • Structured Formats (Lists, Tables, FAQs): If you want to get quoted, make it easy to lift your content. Bullets, tables, and Q&A formats are goldmines for answer engines.
  • Defined Topic Scope At The Top: Put your TL;DR early. Don’t make the model (or the user) scroll through 600 words of brand story before getting to the meat.
  • Semantic Cues In The Body: Words like “in summary,” “the most important,” “step 1,” and “common mistake” help LLMs identify relevance and structure. There’s a reason so much AI-generated content uses those “giveaway” phrases. It’s not because the model is lazy or formulaic. It’s because it actually knows how to structure information in a way that’s clear, digestible, and effective, which, frankly, is more than can be said for a lot of human writers.

A Real-World Example: Why My Own Article Didn’t Show Up

In December 2024, I wrote a piece about the relevance of schema in AI-first search.

It was structured for clarity, timeliness, and was highly relevant to this conversation, but didn’t show up in my research queries for this article (the one you are presently reading). The reason? I didn’t use the term “LLM” in the title or slug.

All of the articles returned in my search had “LLM” in the title. Mine said “AI Search” but didn’t mention LLMs explicitly.

You might assume that a large language model would understand “AI search” and “LLMs” are conceptually related – and it probably does – but understanding that two things are related and choosing what to return based on the prompt are two different things.

Where does the model get its retrieval logic? From the prompt. It interprets your question literally.

If you say, “Show me articles about LLMs using schema,” it will surface content that directly includes “LLMs” and “schema” – not necessarily content that’s adjacent, related, or semantically similar, especially when it has plenty to choose from that contains the words in the query (a.k.a. the prompt).

So, even though LLMs are smarter than traditional crawlers, retrieval is still rooted in surface-level cues.

This might sound suspiciously like keyword research still matters – and yes, it absolutely does. Not because LLMs are dumb, but because search behavior (even AI search) still depends on how humans phrase things.

The retrieval layer – the layer that decides what’s eligible to be summarized or cited – is still driven by surface-level language cues.

What Research Tells Us About Retrieval

Even recent academic work supports this layered view of retrieval.

A 2023 research paper by Doostmohammadi et al. found that simpler, keyword-matching techniques, like a method called BM25, often led to better results than approaches focused solely on semantic understanding.

The improvement was measured through a drop in perplexity, which tells us how confident or uncertain a language model is when predicting the next word.

In plain terms: Even in systems designed to be smart, clear and literal phrasing still made the answers better.

So, the lesson isn’t just to use the language they’ve been trained to recognize. The real lesson is: If you want your content to be found, understand how AI search works as a system – a chain of prompts, retrieval, and synthesis. Plus, make sure you’re aligned at the retrieval layer.

This isn’t about the limits of AI comprehension. It’s about the precision of retrieval.

Language models are incredibly capable of interpreting nuanced content, but when they’re acting as search agents, they still rely on the specificity of the queries they’re given.

That makes terminology, not just structure, a key part of being found.

How To Structure Content For AI Search

If you want to increase your odds of being cited, summarized, or quoted by AI-driven search engines, it’s time to think less like a writer and more like an information architect – and structure content for AI search accordingly.

That doesn’t mean sacrificing voice or insight, but it does mean presenting ideas in a format that makes them easy to extract, interpret, and reassemble.

Core Techniques For Structuring AI-Friendly Content

Here are some of the most effective structural tactics I recommend:

Use A Logical Heading Hierarchy

Structure your pages with a single clear H1 that sets the context, followed by H2s and H3s that nest logically beneath it.

LLMs, like human readers, rely on this hierarchy to understand the flow and relationship between concepts.

If every heading on your page is an H1, you’re signaling that everything is equally important, which means nothing stands out.

Good heading structure is not just semantic hygiene; it’s a blueprint for comprehension.

Keep Paragraphs Short And Self-Contained

Every paragraph should communicate one idea clearly.

Walls of text don’t just intimidate human readers; they also increase the likelihood that an AI model will extract the wrong part of the answer or skip your content altogether.

This is closely tied to readability metrics like the Flesch Reading Ease score, which rewards shorter sentences and simpler phrasing.

While it may pain those of us who enjoy a good, long, meandering sentence (myself included), clarity and segmentation help both humans and LLMs follow your train of thought without derailing.

Use Lists, Tables, And Predictable Formats

If your content can be turned into a step-by-step guide, numbered list, comparison table, or bulleted breakdown, do it. AI summarizers love structure, so do users.

Frontload Key Insights

Don’t save your best advice or most important definitions for the end.

LLMs tend to prioritize what appears early in the content. Give your thesis, definition, or takeaway up top, then expand on it.

Use Semantic Cues

Signal structure with phrasing like “Step 1,” “In summary,” “Key takeaway,” “Most common mistake,” and “To compare.”

These phrases help LLMs (and readers) identify the role each passage plays.

Avoid Noise

Interruptive pop-ups, modal windows, endless calls-to-action (CTAs), and disjointed carousels can pollute your content.

Even if the user closes them, they’re often still present in the Document Object Model (DOM), and they dilute what the LLM sees.

Think of your content like a transcript: What would it sound like if read aloud? If it’s hard to follow in that format, it might be hard for an LLM to follow, too.

The Role Of Schema: Still Useful, But Not A Magic Bullet

Let’s be clear: Structured data still has value. It helps search engines understand content, populate rich results, and disambiguate similar topics.

However, LLMs don’t require it to understand your content.

If your site is a semantic dumpster fire, schema might save you, but wouldn’t it be better to avoid building a dumpster fire in the first place?

Schema is a helpful boost, not a magic bullet. Prioritize clear structure and communication first, and use markup to reinforce – not rescue – your content.

How Schema Still Supports AI Understanding

That said, Google has recently confirmed at Search Central Live in Madrid that its LLM (Gemini), which powers AI Overviews, does leverage structured data to help understand content more effectively.

In fact, at the event, John Mueller recommends to use structured data because it gives models clearer signals about intent and structure.

That doesn’t contradict the point; it reinforces it. If your content isn’t already structured and understandable, schema can help fill the gaps. It’s a crutch, not a cure.

Schema is a helpful boost, but not a substitute, for structure and clarity.

In AI-driven search environments, we’re seeing content without any structured data show up in citations and summaries because the core content was well-organized, well-written, and easily parsed.

In short:

  • Use schema when it helps clarify the intent or context.
  • Don’t rely on it to fix bad content or a disorganized layout.
  • Prioritize content quality and layout before markup.

The future of content visibility is built on how well you communicate, not just how well you tag.

Conclusion: Structure For Meaning, Not Just For Machines

Optimizing for LLMs doesn’t mean chasing new tools or hacks. It means doubling down on what good communication has always required: clarity, coherence, and structure.

If you want to stay competitive, you’ll need to structure content for AI search just as carefully as you structure it for human readers.

The best-performing content in AI search isn’t necessarily the most optimized. It’s the most understandable. That means:

  • Anticipating how content will be interpreted, not just indexed.
  • Giving AI the framework it needs to extract your ideas.
  • Structuring pages for comprehension, not just compliance.
  • Anticipating and using the language your audience uses, because LLMs respond literally to prompts and retrieval depends on those exact terms being present.

As search shifts from links to language, we’re entering a new era of content design. One where meaning rises to the top, and the brands that structure for comprehension will rise right along with it.

More Resources:


Featured Image: Igor Link/Shutterstock

Complete Crawler List For AI User-Agents [Dec 2025]

Control AI visibility and server strain with a log-validated index of bots, complete with user-agent strings, official IP ranges, crawl rates, and allowlist/block best practices.

Vahan Petrosyan Vahan Petrosyan 5.1K Reads
Complete Crawler List For AI User-Agents [Dec 2025]

AI visibility plays a crucial role for SEOs, and this starts with controlling AI crawlers. If AI crawlers can’t access your pages, you’re invisible to AI discovery engines.

On the flip side, unmonitored AI crawlers can overwhelm servers with excessive requests, causing crashes and unexpected hosting bills.

User-agent strings are essential for controlling which AI crawlers can access your website, but official documentation is often outdated, incomplete, or missing entirely. So, we curated a verified list of AI crawlers from our actual server logs as a useful reference.

Every user-agent is validated against official IP lists when available, ensuring accuracy. We will maintain and update this list to catch new crawlers and changes to existing ones.

The Complete Verified AI Crawler List (December 2025)

Name Purpose Crawl Rate of SEJ (pages/hour) Verified IP List Robots.txt disallow Complete User Agent
GPTBot AI training data collection for GPT models (ChatGPT, GPT-4o) 100 Official IP List User-agent: GPTBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.3; +https://openai.com/gptbot)
ChatGPT-User AI agent for real-time web browsing when users interact with ChatGPT 2400 Official IP List User-agent: ChatGPT-User
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot
OAI-SearchBot AI search indexing for ChatGPT search features (not for training) 150 Official IP List User-agent: OAI-SearchBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.3; +https://openai.com/searchbot
ClaudeBot AI training data collection for Claude models 500 Official IP List User-agent: ClaudeBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
Claude-User AI agent for real-time web access when Claude users browse <10 Not available User-agent: Claude-User
Disallow: /sample-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-User/1.0; +Claude-User@anthropic.com)
Claude-SearchBot AI search indexing for Claude search capabilities <10 Not available User-agent: Claude-SearchBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-SearchBot/1.0; +https://www.anthropic.com)
Google-CloudVertexBot AI agent for Vertex AI Agent Builder (site owners’ request only) <10 Official IP List User-agent: Google-CloudVertexBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.7390.122 Mobile Safari/537.36 (compatible; Google-CloudVertexBot; +https://cloud.google.com/enterprise-search)
Google-Extended Token controlling AI training usage of Googlebot-crawled content. User-agent: Google-Extended
Allow: /
Disallow: /private-folder
Gemini-Deep-Research AI research agent for Google Gemini’s Deep Research feature <10 Official IP List User-agent: Gemini-Deep-Research
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Gemini-Deep-Research; +https://gemini.google/overview/deep-research/) Chrome/135.0.0.0 Safari/537.36
Google  Gemini’s chat when a user asks to open a webpage <10 Google
Bingbot Powers Bing Search and Bing Chat (Copilot) AI answers 1300 Official IP List User-agent: BingBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36
Applebot-Extended Doesn’t crawl but controls how Apple uses Applebot data. <10 Official IP List User-agent: Applebot-Extended
Allow: /
Disallow: /private-folder
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)
PerplexityBot AI search indexing for Perplexity’s answer engine 150 Official IP List User-agent: PerplexityBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
Perplexity-User AI agent for real-time browsing when Perplexity users request information <10 Official IP List User-agent: Perplexity-User
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user)
Meta-ExternalAgent AI training data collection for Meta’s LLMs (Llama, etc.) 1100 Not available User-agent: meta-externalagent
Allow: /
Disallow: /private-folder
meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
Meta-WebIndexer Used to improve Meta AI search. <10 Not available User-agent: Meta-WebIndexer
Allow: /
Disallow: /private-folder
meta-webindexer/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
Bytespider AI training data for ByteDance’s LLMs for products like TikTok <10 Not available User-agent: Bytespider
Allow: /
Disallow: /private-folder
Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; https://zhanzhang.toutiao.com/)
Amazonbot AI training for Alexa and other Amazon AI services 1050 Not available User-agent: Amazonbot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) Chrome/119.0.6045.214 Safari/537.36
DuckAssistBot AI search indexing for DuckDuckGo search engine 20 Official IP List User-agent: DuckAssistBot
Allow: /
Disallow: /private-folder
DuckAssistBot/1.2; (+http://duckduckgo.com/duckassistbot.html)
MistralAI-User Mistral’s real-time citation fetcher for “Le Chat” assistant <10 Not available User-agent: MistralAI-User
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; MistralAI-User/1.0; +https://docs.mistral.ai/robots)
Webz.io Data extraction and web scraping used by other AI training companies. Formerly known as Omgili. <10 Not available User-agent: webzio
Allow: /
Disallow: /private-folder
webzio (+https://webz.io/bot.html)
Diffbot Data extraction and web scraping used by companies all over the world. <10 Not available User-agent: Diffbot
Allow: /
Disallow: /private-folder
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729; Diffbot/0.1; +http://www.diffbot.com)
ICC-Crawler AI and machine learning data collection <10 Not available User-agent: ICC-Crawler
Allow: /
Disallow: /private-folder
ICC-Crawler/3.0 (Mozilla-compatible; ; https://ucri.nict.go.jp/en/icccrawler.html)
CCBot Open-source web archive used as training data by multiple AI companies <10 Official IP List User-agent: CCBot
Allow: /
Disallow: /private-folder
CCBot/2.0 (https://commoncrawl.org/faq/)

The user-agent strings above have all been verified against Search Engine Journal server logs.

Popular AI Agent Crawlers With Unidentifiable User Agent

We’ve found that the following didn’t identify themselves:

  • you.com.
  • ChatGPT’s agent Operator.
  • Bing’s Copilot chat.
  • Grok.

There is no way to track this crawler from accessing webpages other than by identifying the explicit IP.

We set up a trap page (e.g., /specific-page-for-you-com/) and used the on-page chat to prompt you.com to visit it, allowing us to locate the corresponding visit record and IP address in our server logs. Below is the screenshot:

Screenshot by author, December 2025

What About Agentic AI Browsers?

Unfortunately, AI browsers such as Comet or ChatGPT’s Atlas don’t differentiate themselves in the user agent string, and you can’t identify them in server logs and blend with normal users’ visits.

Chatgpt's Atlas browser user agetn string from server logs records
ChatGPT’s Atlas browser user agent string from server logs records (Screenshot by author, December 2025)

This is disappointing for SEOs because tracking agentic browser visits to a website is important for reporting POV.

How To Check What’s Crawling Your Server

Some hosting companies offer a user interface (UI) that makes it easy to access and look at server logs, depending on what hosting service you are using.

If your hosting doesn’t offer this, you can get server log files (usually located  /var/log/apache2/access.log in Linux-based servers) via FTP or request it from your server support to send it to you.

Once you have the log file, you can view and analyze it in either Google Sheets (if the file is in CSV format), Screaming Frog’s log analyzer, or, if your log file is less than 100 MB, you can try analyzing it with Gemini AI.

How To Verify Legitimate Vs. Fake Bots

Fake crawlers can spoof legitimate user agents to bypass restrictions and scrape content aggressively. For example, anyone can impersonate ClaudeBot from their laptop and initiate crawl request from the terminal. In your server log, you will see it as Claudebot is crawling it:

curl -A 'Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)' https://example.com

Verification can help to save server bandwidth and prevent harvesting content illegally. The most reliable verification method you can apply is checking the request IP.

Check all IPs and scan to match if it’s one of the officially declared IPs listed above. If so, you can allow the request; otherwise, block.

Various types of firewalls can help you with this via allowlist verified IPs (which allows legitimate bot requests to pass through), and all other requests impersonating AI crawlers in their user agent strings are blocked.

For example, in WordPress, you can use Wordfence free plugin to allowlist legitimate IPs from the official lists (as above) and add blocking custom rules as below:

The allowlist rule is superior, and it will let legitimate crawlers pass through and block any impersonation request which comes from different IPs.

However, please note that it is possible to spoof an IP address, and in that case, when bot user agent and IPs are spoofed, you won’t be able to block it.

Conclusion: Stay In Control Of AI Crawlers For Reliable AI Visibility

AI crawlers are now part of our web ecosystem, and the bots listed here represent the major AI platforms currently indexing the web, although this list is likely to grow.

Check your server logs regularly to see what’s actually hitting your site and make sure you inadvertently don’t block AI crawlers if visibility in AI search engines is important for your business. If you don’t want AI crawlers to access your content, block them via robots.txt using the user-agent name.

We’ll keep this list updated as new crawlers emerge and update existing ones, so we recommend you bookmark this URL, or revisit this article on a regular basis to keep your AI crawler list up to date.

More Resources:


Featured Image: BestForBest/Shutterstock

AI Search in 2026: The 5 Article GEO & SEO Playbook For Modern Visibility
In partnership with Rundown

You need to know how AI search systems determine organic visibility in 2026. To succeed, crawlability, content structure, and entity clarity must be clearly understood by AI systems. This article stack breaks down how SEO and GEO work together to influence AI-powered search results.

You’ll Learn:

  • 3 GEO playbooks on Content, Technical SEO & Authority
  • Guidance on how to adapt current SEO strategies
  • Which AI user-agents crawl your site, and what signals look for
  • How LLMs interpret your site structure and entities

By clicking the “Submit” button, I agree to the terms of the Alpha Brand Media content agreement and privacy policy.

Search Engine Journal uses the information you provide to contact you about our relevant content and promotions. Search Engine Journal will share the information you provide with the following sponsors, who will use your information for similar purposes: Contentful. You can unsubscribe from communications from Search Engine Journal at any time.

Unlock this exclusive article stack.

By clicking the “Submit” button, I agree to the terms of the Alpha Brand Media content agreement and privacy policy.

Search Engine Journal uses the information you provide to contact you about our relevant content and promotions. Search Engine Journal will share the information you provide with the following sponsors, who will use your information for similar purposes: Contentful. You can unsubscribe from communications from Search Engine Journal at any time.