AI tools produce different brand recommendation lists nearly every time they answer the same question, according to a new report from SparkToro.
The data showed a <1-in-100 chance that ChatGPT or Google’s AI in Search (AI Overviews/AI Mode) would return the same list of brands across repeated runs of the same prompt.
Rand Fishkin, SparkToro co-founder, conducted the research with Patrick O’Donnell from Gumshoe.ai, an AI tracking startup. The team ran 2,961 prompts across ChatGPT, Claude, and Google Search AI Overviews (with AI Mode used when Overviews didn’t appear) using hundreds of volunteers over November and December.
What The Data Found
The authors tested 12 prompts requesting brand recommendations across categories, including chef’s knives, headphones, cancer care hospitals, digital marketing consultants, and science fiction novels.
Each prompt was run 60-100 times per platform. Nearly every response was unique in three ways: the list of brands presented, the order of recommendations, and the number of items returned.
Fishkin summarized the core finding:
“If you ask an AI tool for brand/product recommendations a hundred times nearly every response will be unique.”
Claude showed slightly higher consistency in producing the same list twice, but was less likely to produce the same ordering. None of the platforms came close to the authors’ definition of reliable repeatability.
The Prompt Variability Problem
The authors also examined how real users write prompts. When 142 participants were asked to write their own prompts about headphones for a traveling family member, almost no two prompts looked similar.
The semantic similarity score across those human-written prompts was 0.081. Fishkin compared the relationship to:
“Kung Pao Chicken and Peanut Butter.”
The prompts shared a core intent but little else.
Despite the prompt diversity, the AI tools returned brands from a relatively consistent consideration set. Bose, Sony, Sennheiser, and Apple appeared in 55-77% of the 994 responses to those varied headphone prompts.
What This Means For AI Visibility Tracking
The findings question the value of “AI ranking position” as a metric. Fishkin wrote: “any tool that gives a ‘ranking position in AI’ is full of baloney.”
However, the data suggests that how often a brand appears across many runs of similar prompts is more consistent. In tight categories like cloud computing providers, top brands appeared in most responses. In broader categories like science fiction novels, the results were more scattered.
This aligns with other reports we’ve covered. In December, Ahrefs published data showing that Google’s AI Mode and AI Overviews cite different sources 87% of the time for the same query. That report focused on a different question: the same platform but with different features. This SparkToro data examines the same platform and prompt, but with different runs.
The pattern across these studies points in the same direction. AI recommendations appear to vary at every level, whether you’re comparing across platforms, across features within a platform, or across repeated queries to the same feature.
Methodology Notes
The research was conducted in partnership with Gumshoe.ai, which sells AI tracking tools. Fishkin disclosed this and noted that his starting hypothesis was that AI tracking would prove “pointless.”
The team published the full methodology and raw data on a public mini-site. Survey respondents used their normal AI tool settings without standardization, which the authors said was intentional to capture real-world variation.
The report is not peer-reviewed academic research. Fishkin acknowledged methodological limitations and called for larger-scale follow-up work.
Looking Ahead
The authors left open questions about how many prompt runs are needed to obtain reliable visibility data and whether API calls yield the same variation as manual prompts.
When assessing AI tracking tools, the findings suggest you should ask providers to demonstrate their methodology. Fishkin wrote:
“Before you spend a dime tracking AI visibility, make sure your provider answers the questions we’ve surfaced here and shows their math.”
Featured Image: NOMONARTS/Shutterstock