Reddit sued Perplexity and three data-scraping firms in New York federal court, alleging the companies bypassed access controls to obtain Reddit content at scale, including by scraping Google search results.
Perplexity posted a public response, saying it summarizes Reddit discussions with citations and doesn’t train AI models on Reddit content.
The position is consistent with the company’s past statements. Whether it addresses the specific allegations in Reddit’s filing remains an open question.
The complaint names Oxylabs UAB, AWMProxy, and SerpApi as intermediaries. It alleges Perplexity is a SerpApi customer and purchased and/or utilized SerpApi services to circumvent controls and copy Reddit data.
Evidence In The Complaint
Perplexity’s argument is built around a technical distinction. The company says it summarizes and cites discussions rather than training models on Reddit posts.
Perplexity wrote in its Reddit response:
“We summarize Reddit discussions, and we cite Reddit threads in answers, just like people share links to posts here all the time.”
The complaint, however, presents technical claims that call that framework into question.
According to the filing, Reddit created a test post that was only crawlable by Google’s search engine and not accessible anywhere else on the internet. Within hours, that hidden content appeared in Perplexity’s results.
The filing also says that after Reddit sent a cease-and-desist letter, Perplexity’s citations to Reddit increased roughly forty-fold.
Similar Accusations From Publishers
Forbes previously accused Perplexity of republishing an exclusive and threatened legal action.
Wired reported that Perplexity used undisclosed IPs and spoofed user-agent strings to bypass robots.txt. Wired’s
Cloudflare later said Perplexity used “stealth, undeclared crawlers” that ignored no-crawl directives, based on tests it ran in August.
How Perplexity Has Responded
In previous disputes, Perplexity said issues stemmed from rough edges on new products and promised clearer attribution.
The company has also argued that some media organizations are trying to control “publicly reported facts.”
In this latest response, Perplexity frames Reddit’s lawsuit as leverage in broader training-data negotiations and writes:
“We summarize Reddit discussions… We won’t be extorted, and we won’t help Reddit extort Google.”
Why This Matters
This issue matters because it concerns how AI assistants use forum content that your audiences read and that publishers frequently cite.
The legal questions go beyond just training.
Courts may examine if technical controls have been bypassed, whether summarization infringes on protected expressions, and if using third-party scrapers could lead to legal liability for downstream products.
If courts accept Reddit’s anti-circumvention argument, it could lead to changes in how assistants cite or link Reddit threads.
On the other hand, if courts agree with Perplexity’s viewpoint, assistants might start relying more on forum discussions that are less restricted by licensing.
What We Don’t Know Yet
The filing alleges Perplexity obtained data via at least one scraping firm, but the public complaint doesn’t specify which vendor supplied which data or include transaction details.