How GenAI Platforms Bypass Paywalled Content

In This Blog

Key Takeaways

Perplexity Sued by Dow Jones & Company, Inc. and NYP Holdings, Inc. for Accessing Paywalled Content

Recently, Perplexity has been embroiled in controversy for using publisher content in its search for results, a practice covered by Forbes and Wired. These publications have accused Perplexity of “lifting” their content, with Wired specifically analyzing how Perplexity’s handling of robots.txt files enables access to paywalled materials. On October 17, 2024, TechCrunch reported that The New York Times sent a cease-and-desist letter to OpenAI and Perplexity.

According to The National Law Review, the case was amended in December of 2024. The amendment asserts “copyright infringement, false designation of origin, and trademark dilution claims against Perplexity.” Additionally, plaintiffs in the case are demanding a jury trial.

Perplexity’s response to the lawsuit shows the company believes, “The lawsuit reflects an adversarial posture between media and tech that is… fundamentally shortsighted, unnecessary, and self-defeating.” They go on to say they are willing to work with the Post and the Journal in good faith.

We undertook a brief analysis to explore how generative AI platforms handle paywalled articles, submitting the following paywalled links to Perplexity for summary:

What is Paywalled Content

Paywalled content refers to web content that requires a payment to access. It’s often seen from large-scale news sources. Websites with paywalled content require a subscription and a login for users to read entire articles. In some cases, paywalled content can be “gifted” by subscription holders to friends.

How Does Perplexity Bypass Paywalls

It is unknown how Perplexity is bypassing paywalled content. However, TechCrunch notes that websites using Robots.txt files to protect their content, and “Perplexity appears to be willingly circumventing these blocks by changing its bots’ “user agent…”

Copyleaks AI Paywall Bypass Case Study

At Copyleaks, we undertook a brief analysis to explore how generative AI platforms handle paywalled articles, submitting the following links to Perplexity for summary:

Perplexity AI Claims it Didn’t Bypass Paywalled Content to Summarize Articles

Perplexity provided detailed summaries for each paywalled article. 

Here are the responses from Perpexity:

Perplexity Screenshot
Perplexity Screenshot
Perplexity Screenshot

When asked how it accessed the paywalled content, Perplexity actually apologized, claiming it did not read the articles. See this example:

Perplexity Screenshot

And this longer response:

Perplexity Screenshot

It may have said that it didn’t read the paywalled content, but the specific references in its summaries potentially contradict the responses.

For example, the Forbes Benchmark article reads, “All of the firm’s partners are expected to look at AI companies within their typical areas of concentration, such as consumer tech, cloud computing, or crypto, a source with knowledge of the firm’s thinking told Forbes.” Meanwhile, Perplexity’s summary reads “All partners will explore investment opportunities in AI companies within their respective areas of expertise, such as consumer tech, cloud computing, and crypto.” The similarities here suggest that Perplexity may, indeed, bypass paywalls using undisclosed or unclear techniques.

However, the following example shows that the specific references in their summaries potentially contradict these responses:

Perplexities Summaries Are Remarkably Similar to Paywalled Content

The Forbes Benchmark paywalled article, for example, reads: “All of the firm’s partners are expected to look at AI companies within their typical areas of concentration, such as consumer tech, cloud computing or crypto, a source with knowledge of the firm’s thinking told Forbes.”

Meanwhile, Perplexity’s summary reads: “All partners will explore investment opportunities in AI companies within their respective areas of expertise, such as consumer tech, cloud computing, and crypto.” 

This suggests that Perplexity may, indeed, bypass paywalls using undisclosed or unclear techniques.

Using Copyleaks Plagiarism Detector to Compare Summaries

We took our investigation a step further and used a Plagiarism Detector to determine whether any of the content from the original articles was being plagiarized or paraphrased by Perplexity’s summaries. The results from the Forbes example were quite striking. 

One Perplexity summary paraphrased 48% of the article, while the other summary plagiarized 7% of the article and paraphrased 28% of it. It’s important to note, however, that the Information summary contained no plagiarized or paraphrased content. 

Copyleaks Screenshot
Copyleaks Screenshot
Copyleaks Screenshot

Gemini Refuses to Summarize Paywalled Content

Other generative AI platforms tested more directly refused to bypass paywalled content to provide a summary.
ChatGPT screenshot

The inconsistency in responses from Perplexity highlights the need to explore the appropriate use of AI-generated copy and content and how AI platforms interact with protected content.

How to Safeguard Your IP with Copyleaks

As AI continues to grow exponentially, it’s essential to consider how to protect your content. With Copyleaks, you can safeguard your intellectual and proprietary content from AI LLMs. Our AI detectors and Plagiarism help you analyze AI content, enforce copyright laws, and collect evidence of unauthorized usage.

Build trust, protect your brand, and stay ahead in the age of AI.

Request a custom Copyleaks demo and see how the world’s top enterprises ensure trust and transparency.

Related Blogs