Perplexity AI: The $20 Billion Plagiarism Machine That Can't Find Its Own Sources

The pitch is perfect. "Where knowledge begins." Accurate answers, real-time citations, no ads, no SEO garbage, no Google telling you to eat rocks. Type a question, get a clean answer with sources right there in the UI. Finally, an AI that respects your intelligence.

The actual product: a search wrapper that hallucinates citations, scrapes content it doesn't own, got caught disguising its bots as Google Chrome to evade websites that said no, and is now being sued by — deep breath — The New York Times, Dow Jones, News Corp, Encyclopaedia Britannica, Merriam-Webster, and Reddit. All at once. While sitting on a $20 billion valuation.

"Where knowledge begins." Sure. Just maybe double-check that the knowledge actually exists first.

What Perplexity told you it was

Perplexity launched in 2022 as a "conversational search engine" — or more precisely, an answer engine. The premise was clean: instead of ten blue links, you get a direct answer with citations. Instead of opening five tabs, you open zero. The AI synthesizes, you consume.

By early 2024, it was processing over 10 million queries a day 1 and the press was fully in the tank. "The $14 Billion AI Google Killer," headlined Gizmodo. Tech media wrote breathless profiles of founder Aravind Srinivas — the middle-class Chennai kid who failed the IIT computer science admissions, went to Berkeley, worked at OpenAI and DeepMind, and built what everyone was calling Google's first real challenger in a decade.

The valuation trajectory was genuinely staggering: $520 million in January 2024, then $3 billion by June, $9 billion by December, $14 billion in early 2025, $20 billion by September 2025 2. That's a 38x increase in valuation in under two years. Jeff Bezos backed it. SoftBank backed it. Nvidia backed it. The product hadn't changed much — but the hype infrastructure had fully assembled.

Revenue grew too, to be fair: about $34 million in 2024, roughly $232 million annualized by end of 2025, on track for their $656 million target for 2026 2. Forty-five million monthly active users. A $20 monthly Pro subscription. The browser Comet. The Perplexity API.

The story had all the ingredients: underpromised, overdelivered, disrupting a trillion-dollar incumbent. One problem: the product has a deeply uncomfortable relationship with truth.

The citations are the product. The citations are also broken.

Here's what Perplexity promises: "accurate, trusted, and real-time answers to any question." The entire value proposition rests on citations. It's not just a chatbot making stuff up — it shows you its work. Little numbered sources, right there. You can check.

Except a meaningful number of those sources either don't say what Perplexity claims they say, or don't exist at all.

One Reddit user doing research on PRC land ownership policy used Perplexity's "Deep Research" feature, posted the findings elsewhere, and got called out when their sources couldn't be found. They went back to Perplexity and asked for direct links. The response: Perplexity said it couldn't find the sources it had just cited. The dates on the fabricated papers: July 2025 3.

"Be extra careful if you use it for any kind of research, and verify all sources personally. Make sure they exist and they say what the AI claims they say. Not merely the accuracy of the claim, but the existence of the source at all."

A different user, paying for the Education Pro plan, documented Perplexity being "consistently wrong" — not occasionally, consistently. Legal questions. School comparison questions. When pushed back on an incorrect answer, Perplexity called their correction "catching a subtle but important distinction that many people miss" while simultaneously admitting it had "overstated" its original position 4. The product's defense mechanism against being wrong is to compliment you for being right.

The Dow Jones/New York Post lawsuit got specific about what "hallucination plus wrong attribution" actually looks like in practice. In one documented case, Perplexity copied two paragraphs from a New York Post article verbatim, then invented five additional paragraphs on free speech and internet regulation — and attributed the entire thing to the New York Post 5. Not a hallucinated fact in isolation. A fabricated article, dressed in a real publication's name, served to users who reasonably trusted the citation.

Medical professionals have reported fabricated drug dosages. Legal researchers have found non-existent case citations. The product's confidence UI — numbered sources, clean answers — is doing real damage because it trains users to stop verifying. The answer looks sourced. It often isn't.

The actual business model: steal now, pay lawyers later

While users were debating citation quality, Perplexity was having a different problem: it was getting caught stealing the content it cites.

In August 2025, Cloudflare published a detailed technical investigation that is worth reading if you want to understand just how brazen this was 6.

Here's what they found: Perplexity has two crawlers. One is declared — it identifies itself as Perplexity-User/1.0, making 20-25 million requests per day. When websites block that crawler through robots.txt (the standard internet protocol that says "please don't scrape us"), something else starts showing up. A different crawler — undeclared, unregistered, making no mention of Perplexity — disguised as a regular Chrome browser on a MacBook.

The fake Chrome UA string: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36. Completely indistinguishable from a human browsing. 3-6 million requests per day. When even that gets blocked, it rotates IPs across different ASNs — network blocks you can't predict or blacklist in advance. Cloudflare set up a test domain with robots.txt explicitly blocking all crawlers, announced nowhere, never indexed. Perplexity scraped it anyway, then told users who asked about the site that it had "no publicly accessible robots.txt file."

Cloudflare's summary: Perplexity "repeatedly modifying their user agent and changing IPs and ASNs to hide their crawling activity, in direct conflict with explicit no-crawl preferences expressed by websites."

The lawsuit count since: The New York Times filed in December 2025 7. Encyclopaedia Britannica and Merriam-Webster filed in September 2025, with evidence that included Perplexity's output nearly matching the dictionary's own definition of "plagiarize" word for word 8. Reddit filed in October 2025, accusing Perplexity of scraping billions of user comments 9. News Corp filed. Forbes and Wired had been reporting the scraping practices since mid-2024.

Independent analysis by Copyleaks found that one Perplexity summary paraphrased 48% of a Forbes article 10. Another contained 28% paraphrased content. This is not accidental model behavior — it's the product working as designed, summarizing content it vacuumed up without permission.

The business model is a simple arbitrage: ingest everyone's content for free, sell subscriptions to access curated summaries of that content, pocket the margin. Publishers get nothing. Perplexity gets a $20 billion valuation. The writers and journalists and researchers whose work powers every answer get exactly what the process diagram in the Downfall analysis shows: a trash can icon labeled "publishers get nothing." 11

What you're actually paying for

Let's talk about what Perplexity actually is under the hood.

The "AI" doing the answering is not a proprietary Perplexity model. It's a thin interface layer sitting on top of GPT-4 and other publicly available LLMs, with retrieval from Bing and Google's search APIs — search results you could already access, repackaged through someone else's model. Perplexity built a better UI for a thing that already existed, then claimed to be killing Google.

That's not inherently wrong. Good UX is valuable. But it means the core differentiator — "accurate, trusted, real-time answers" — depends entirely on two things Perplexity doesn't control: the quality of the underlying model, and the integrity of the sources it cites. On both dimensions, Perplexity has severe, documented problems.

The Pro tier costs $20/month. For $20/month you can also have Claude 3.7 Sonnet with extended thinking, which scores in the top tier on every major benchmark and won't make up citations. Or you can keep your existing Google search and a free Claude or ChatGPT subscription. Perplexity's product advantage is largely cosmetic.

Meanwhile, the company has a revenue problem that the valuation papers over. At $232 million annualized revenue against a $20 billion valuation, that's an 86x revenue multiple. The path to $656 million in 2026 requires either converting a large fraction of 45 million users to paid subscriptions, expanding enterprise deals amid an active plagiarism scandal, or introducing advertising — which would immediately destroy the "no ads, no SEO garbage" brand positioning that attracted users in the first place 12.

Perplexity is aware of the tension. Their publisher revenue-sharing pool as of mid-2025 was $42.5 million total 13. That's the amount earmarked to compensate the entire publishing industry for the content that powers every answer Perplexity sells. Forty-two and a half million dollars for every journalist, every encyclopedia editor, every forum moderator whose work Perplexity monetizes. Britannica and Merriam-Webster looked at that number and hired lawyers instead.

Verdict

Perplexity is not a Google killer. It's a Google repackager that built a beautiful frontend, raised money at fantasy multiples, and funded its growth by treating everyone else's content as a free commodity.

The product genuinely works better than Google for some queries — quick factual questions, brief synthesis tasks, looking something up without wading through SEO slop. That's real value. It's also not worth $20 billion, and it's not what was sold.

What was sold: a trusted knowledge engine. What you're getting: an answer machine that hallucinates citations with such confidence that a lawyer, a doctor, and a researcher each walked away trusting output that was fabricated. An infrastructure that disguises itself as a Chrome browser to scrape sites that said no. A company that can't afford to pay creators fairly, can't survive adding ads, and is currently being litigated by six major content organizations simultaneously.

The citations are the product. That's the whole pitch. And the citations are routinely made up.

That's not a bug Perplexity is working on fixing. That's an architecture problem baked into every large language model that has ever existed. Perplexity didn't solve it — they just made the hallucinated output look more trustworthy than it actually is.

The $20 billion bet is that nobody will notice the difference between looking authoritative and being accurate. For a while, it was working. The lawsuits suggest the bill is coming due.