Access Denied? Not Quite. Exploring AI’s Interaction with Paywalled Content

In This Blog

Introduction

Recently, Perplexity has been embroiled in controversy for using publisher content in its search results, a practice covered by Forbes and Wired. These publications have accused Perplexity of “lifting” their content, with Wired specifically analyzing how Perplexity’s handling of robots.txt files enables access to paywalled materials.

 

We undertook a brief analysis to explore how generative AI platforms handle paywalled articles, submitting the following paywalled links to Perplexity for summary:

Perplexity Screenshot

Perplexity Summaries

Perplexity provided summaries for each article.

Perplexity Screenshot
Perplexity Screenshot
Perplexity Screenshot

When asked how it accessed the paywalled content, Perplexity actually apologized, claiming it did not read the articles. See this example:

Perplexity Screenshot

And this longer example:

Perplexity Screenshot

However, these responses are potentially contradicted by the specific references in its
summaries.

The Forbes Benchmark article, for example, reads: “All of the firm’s partners are expected to look at AI companies within their typical areas of concentration such as consumer tech, cloud computing or crypto, a source with knowledge of the firm’s thinking told Forbes.” Meanwhile, Perplexity’s summary reads: “All partners will explore investment opportunities in AI companies within their respective areas of expertise, such as consumer tech, cloud computing, and crypto.” This suggests that Perplexity may, indeed, bypass paywalls using undisclosed or unclear techniques.

Copyleaks Scan Results

What’s more, we then used Copyleaks to determine whether any of the content from the original articles was being plagiarized or paraphrased by Perplexity’s summaries and the results from the two Forbes examples were pretty striking: one Perplexity summary paraphrased 48% of the article while the other summary plagiarized 7% of the article and paraphrased 28% of it. It is important to note, however, that the Information summary contained no plagiarized or paraphrased content.

Copyleaks Screenshot
Copyleaks Screenshot
Copyleaks Screenshot
Other generative AI platforms tested more directly refused to summarize paywalled content.
ChatGPT screenshot
This inconsistency in responses from Perplexity highlights the need to explore appropriate use
of AI-generated copy and content, and how AI platforms interact with protected content.
Find out what's in your copy.

Related Blogs