Key Takeaways
- Controversy Around Perplexity’s Use of Paywalled Content: Perplexity AI has faced criticism for potentially accessing paywalled content through robots.txt files, as reported by Wired.
- Inconsistent Claims About Paywall Access: While Perplexity denies accessing paywalled content, its detailed summaries raise questions about the methods used to generate such accurate responses.
- Evidence of Paraphrasing and Plagiarism: A Copyleaks analysis revealed instances of paraphrasing and plagiarism in Perplexity’s summaries of Forbes articles, but not in summaries of other sources, such as The Information.
- Comparison with Other Generative AI Platforms: Unlike Perplexity, other platforms like ChatGPT explicitly refuse to summarize paywalled content, demonstrating greater adherence to content protection.
- Call for Ethical AI Practices: This case underscores the need for stricter guidelines governing the interactions of AI platforms with copyrighted content to safeguard intellectual property rights.
Perplexity Sued by Dow Jones & Company, Inc. and NYP Holdings, Inc. for Accessing Paywalled Content
Recently, Perplexity has been embroiled in controversy for using publisher content in its search for results, a practice covered by Forbes and Wired. These publications have accused Perplexity of “lifting” their content, with Wired specifically analyzing how Perplexity’s handling of robots.txt files enables access to paywalled materials. On October 17, 2024, TechCrunch reported that The New York Times sent a cease-and-desist letter to OpenAI and Perplexity.
According to The National Law Review, the case was amended in December of 2024. The amendment asserts “copyright infringement, false designation of origin, and trademark dilution claims against Perplexity.” Additionally, plaintiffs in the case are demanding a jury trial.
Perplexity’s response to the lawsuit shows the company believes, “The lawsuit reflects an adversarial posture between media and tech that is… fundamentally shortsighted, unnecessary, and self-defeating.” They go on to say they are willing to work with the Post and the Journal in good faith.
We undertook a brief analysis to explore how generative AI platforms handle paywalled articles, submitting the following paywalled links to Perplexity for summary:
What is Paywalled Content
Paywalled content refers to web content that requires a payment to access. It’s often seen from large-scale news sources. Websites with paywalled content require a subscription and a login for users to read entire articles. In some cases, paywalled content can be “gifted” by subscription holders to friends.
How Does Perplexity Bypass Paywalls
It is unknown how Perplexity is bypassing paywalled content. However, TechCrunch notes that websites using Robots.txt files to protect their content, and “Perplexity appears to be willingly circumventing these blocks by changing its bots’ “user agent…”
Copyleaks AI Paywall Bypass Case Study
At Copyleaks, we undertook a brief analysis to explore how generative AI platforms handle paywalled articles, submitting the following links to Perplexity for summary:
Perplexity AI Claims it Didn’t Bypass Paywalled Content to Summarize Articles
Perplexity provided detailed summaries for each paywalled article.
Here are the responses from Perpexity:
When asked how it accessed the paywalled content, Perplexity actually apologized, claiming it did not read the articles. See this example:
And this longer response:
It may have said that it didn’t read the paywalled content, but the specific references in its summaries potentially contradict the responses.
For example, the Forbes Benchmark article reads, “All of the firm’s partners are expected to look at AI companies within their typical areas of concentration, such as consumer tech, cloud computing, or crypto, a source with knowledge of the firm’s thinking told Forbes.” Meanwhile, Perplexity’s summary reads “All partners will explore investment opportunities in AI companies within their respective areas of expertise, such as consumer tech, cloud computing, and crypto.” The similarities here suggest that Perplexity may, indeed, bypass paywalls using undisclosed or unclear techniques.
However, the following example shows that the specific references in their summaries potentially contradict these responses:
Perplexities Summaries Are Remarkably Similar to Paywalled Content
The Forbes Benchmark paywalled article, for example, reads: “All of the firm’s partners are expected to look at AI companies within their typical areas of concentration, such as consumer tech, cloud computing or crypto, a source with knowledge of the firm’s thinking told Forbes.”
Meanwhile, Perplexity’s summary reads: “All partners will explore investment opportunities in AI companies within their respective areas of expertise, such as consumer tech, cloud computing, and crypto.”
This suggests that Perplexity may, indeed, bypass paywalls using undisclosed or unclear techniques.
Using Copyleaks Plagiarism Detector to Compare Summaries
We took our investigation a step further and used a Plagiarism Detector to determine whether any of the content from the original articles was being plagiarized or paraphrased by Perplexity’s summaries. The results from the Forbes example were quite striking.
One Perplexity summary paraphrased 48% of the article, while the other summary plagiarized 7% of the article and paraphrased 28% of it. It’s important to note, however, that the Information summary contained no plagiarized or paraphrased content.
Gemini Refuses to Summarize Paywalled Content
The inconsistency in responses from Perplexity highlights the need to explore the appropriate use of AI-generated copy and content and how AI platforms interact with protected content.
How to Safeguard Your IP with Copyleaks
As AI continues to grow exponentially, it’s essential to consider how to protect your content. With Copyleaks, you can safeguard your intellectual and proprietary content from AI LLMs. Our AI detectors and Plagiarism help you analyze AI content, enforce copyright laws, and collect evidence of unauthorized usage.