Introduction
Recently, Perplexity has been embroiled in controversy for using publisher content in its search results, a practice covered by Forbes and Wired. These publications have accused Perplexity of “lifting” their content, with Wired specifically analyzing how Perplexity’s handling of robots.txt files enables access to paywalled materials.
We undertook a brief analysis to explore how generative AI platforms handle paywalled articles, submitting the following paywalled links to Perplexity for summary:
Perplexity Summaries
Perplexity provided summaries for each article.
When asked how it accessed the paywalled content, Perplexity actually apologized, claiming it did not read the articles. See this example:
And this longer example:
However, these responses are potentially contradicted by the specific references in its
summaries.
The Forbes Benchmark article, for example, reads: “All of the firm’s partners are expected to look at AI companies within their typical areas of concentration such as consumer tech, cloud computing or crypto, a source with knowledge of the firm’s thinking told Forbes.” Meanwhile, Perplexity’s summary reads: “All partners will explore investment opportunities in AI companies within their respective areas of expertise, such as consumer tech, cloud computing, and crypto.” This suggests that Perplexity may, indeed, bypass paywalls using undisclosed or unclear techniques.
Copyleaks Scan Results
What’s more, we then used Copyleaks to determine whether any of the content from the original articles was being plagiarized or paraphrased by Perplexity’s summaries and the results from the two Forbes examples were pretty striking: one Perplexity summary paraphrased 48% of the article while the other summary plagiarized 7% of the article and paraphrased 28% of it. It is important to note, however, that the Information summary contained no plagiarized or paraphrased content.
of AI-generated copy and content, and how AI platforms interact with protected content.