1. SEJ
  2.  ⋅ 
  3. Reddit

Perplexity Responds To Reddit Lawsuit Over Data Access

Reddit sued Perplexity in over access to Reddit content. Perplexity says it summarizes posts with citations and does not train on them.

  • Reddit says Perplexity and partners bypassed protections to access Reddit content via Google results.
  • Perplexity says it summarizes with citations and doesn’t train on Reddit posts.
  • The complaint cites a rise in Reddit citations after a cease-and-desist.
Perplexity Responds To Reddit Lawsuit Over Data Access

Reddit sued Perplexity and three data-scraping firms in New York federal court, alleging the companies bypassed access controls to obtain Reddit content at scale, including by scraping Google search results.

Perplexity posted a public response, saying it summarizes Reddit discussions with citations and doesn’t train AI models on Reddit content.

The position is consistent with the company’s past statements. Whether it addresses the specific allegations in Reddit’s filing remains an open question.

The complaint names Oxylabs UAB, AWMProxy, and SerpApi as intermediaries. It alleges Perplexity is a SerpApi customer and purchased and/or utilized SerpApi services to circumvent controls and copy Reddit data.

Evidence In The Complaint

Perplexity’s argument is built around a technical distinction. The company says it summarizes and cites discussions rather than training models on Reddit posts.

Perplexity wrote in its Reddit response:

“We summarize Reddit discussions, and we cite Reddit threads in answers, just like people share links to posts here all the time.”

The complaint, however, presents technical claims that call that framework into question.

According to the filing, Reddit created a test post that was only crawlable by Google’s search engine and not accessible anywhere else on the internet. Within hours, that hidden content appeared in Perplexity’s results.

The filing also says that after Reddit sent a cease-and-desist letter, Perplexity’s citations to Reddit increased roughly forty-fold.

Similar Accusations From Publishers

Forbes previously accused Perplexity of republishing an exclusive and threatened legal action.

Wired reported that Perplexity used undisclosed IPs and spoofed user-agent strings to bypass robots.txt. Wired’s

Cloudflare later said Perplexity used “stealth, undeclared crawlers” that ignored no-crawl directives, based on tests it ran in August.

How Perplexity Has Responded

In previous disputes, Perplexity said issues stemmed from rough edges on new products and promised clearer attribution.

The company has also argued that some media organizations are trying to control “publicly reported facts.”

In this latest response, Perplexity frames Reddit’s lawsuit as leverage in broader training-data negotiations and writes:

“We summarize Reddit discussions… We won’t be extorted, and we won’t help Reddit extort Google.”

Why This Matters

This issue matters because it concerns how AI assistants use forum content that your audiences read and that publishers frequently cite.

The legal questions go beyond just training.

Courts may examine if technical controls have been bypassed, whether summarization infringes on protected expressions, and if using third-party scrapers could lead to legal liability for downstream products.

If courts accept Reddit’s anti-circumvention argument, it could lead to changes in how assistants cite or link Reddit threads.

On the other hand, if courts agree with Perplexity’s viewpoint, assistants might start relying more on forum discussions that are less restricted by licensing.

What We Don’t Know Yet

The filing alleges Perplexity obtained data via at least one scraping firm, but the public complaint doesn’t specify which vendor supplied which data or include transaction details.

Category News Reddit
SEJ STAFF Matt G. Southern Senior News Writer at Search Engine Journal

Matt G. Southern, Senior News Writer, has been with Search Engine Journal since 2013. With a bachelor’s degree in communications, ...