Copyleaks Blog

Your learning destination for all things responsible AI, plagiarism and beyond.

Copyleaks Research Finds That Plagiarism Among Students Continues As AI Use Increases from January 2023 to January 2024

One Year Later: ChatGPT and Education

In March of 2023, four months after the industry-disrupting debut of ChatGPT, Copyleaks released a study to answer the question: How prevalent is AI-generated content in education? Compiling anonymized data from tens of thousands of college and high school students worldwide using Copyleaks from January and February 2023, we found that 11.21% of all assignments, from high school to college, contained some form of AI-generated content with a 95.30% increase in usage from January to February.  

These findings, along with several high-profile plagiarism cases within higher education at the end of 2023 and the start of 2024, led us to ask: what is the current trend of plagiarism rates within education more than a year after the release of generative AI? Has AI impacted or altogether removed the need for plagiarism among students? 

Over a year after the release of ChatGPT, we decided it was time to determine the overall effect that generative AI has had on education, specifically the rates of AI among student assignments and how AI has impacted rates of plagiarism. 

Here is what we found.

To Conduct This Analysis

To study the trends of plagiarism and AI rates among students from January 2023 to January 2024, we compiled 13 months worth of anonymized data from tens of thousands of college and high school students from the same educational institutions as our prior study, all of which have been using the AI Content Detector and Plagiarism Detector since January 2023.


Average Rate of AI and Plagiarism Over 13 Months

While generative AI might have had some impact on plagiarism rates among students, it certainly did not eliminate it. Over the 13 months, plagiarism rates did decrease as AI gradually increased, but it remained prevalent.


January 2023 vs January 2024

Rate of AI

Rate of Plagiarism


Average Rate of AI and Plagiarism Quarterly Breakdown

The first quarter of 2023 saw a spike in AI and plagiarism among students within February. However, by March, plagiarism declined while AI continued to climb despite most educational institutions banning AI use. During this time, the next iteration of ChatGPT, GPT-4, was released on March 14, 2023, and Google’s Bard (later rebranded as Gemini) was released on March 21, 2023.

The second quarter reflects educational seasonality, with a spike in AI and plagiarism in April and May, coinciding with finals, but then a gradual decrease through June.

The third quarter of the year did not see a steep decline in AI use among year-round students and those attending summer sessions. During July of 2023, the percentage of papers and assignments scanned that contained AI jumped to 23.03% from 11.18% in June and remained around 3% higher than Quarter 2. However, the percentage of assignments and papers that contained plagiarism decreased in August but only by 4% since Quarter 1. At the end of July, a new AI model, Claude, was released onto the market. GrammarlyGO from Grammarly, powered by Azure OpenAI, the same LLM that powers ChatGPT, was released on August 25, 2023, with a marketing rollout targeted primarily at students.

The fourth quarter saw the most significant average increase of AI within student assignments, reaching a new high in December with 25.39%, compared to 11.92% at the start of the year. Furthermore, plagiarism averaged around 27% at the beginning of the quarter. However, plagiarism took a significant dip in December to 10.54%, potentially attributed to a heightened focus on plagiarism within education following a high-profile plagiarism case at a leading US university at the start of December. In January 2024, the rate of AI among students had risen over 9% compared to January 2023, and the rate of plagiarism had fallen by almost 18% compared to the previous year.


Average Rate of AI and Plagiarism By Country

January 2023 – January 2024

Across the 13 months of data, considerable geographic differences exist among the percentage of student papers and assignments containing AI content and plagiarism.

Percentage of Papers that Contained
AI Generated & Plagiarized Content


Average Rate of AI and Plagiarism By School Type

January 2023 – January 2024

Similarly, there were considerable variances across different institution types regarding the percentage of student papers and assignments containing AI content and plagiarism.


Average Rate of AI and Plagiarism By College Type

January 2023 – January 2024

Finally, there were also considerable variances across types of colleges in the percentage of student papers and assignments containing AI content and plagiarism.


Key Takeaways

The data continues to underscore the need for organizations to adopt multi-prong solutions that detect both the presence of AI-generated content and potential plagiarism. In doing so, these solutions can provide transparency around potential plagiarism, including that from generative AI. As the data shows, AI and plagiarism began to coalesce somewhat in the latter half of the year, highlighting the importance of having insight into whether the content was human-written or AI-generated and where it originated from. That’s why full-spectrum protection, including AI and plagiarism detection, helps uphold academic integrity while empowering authenticity and originality within all content. 

To view and download a PDF version of this study, click here.

Nearly 60% of GTP-3.5 Outputs Contained Some Form of Plagiarized Content

Copyleaks Research Finds Nearly 60% of GPT-3.5 Outputs Contained Some Form of Plagiarized Content

There’s an unprecedented amount of AI-generated content now saturating the internet. According to a 2023 report, by 2026, nearly 90% of all online content will be AI-generated. As a result of AI content saturation, concerns regarding data pollution and inevitable model collapse raise concerns about AI-generated text’s overall quality and reliability.

Furthermore, broader concerns about originality have also begun. In the wake of several lawsuits regarding AI infringing on copyright and potentially plagiarizing, educational institutions and enterprises across the globe are questioning the authenticity of AI text: Where did it originate from? Is it safe to use as original content?

Ultimately, does AI plagiarize?

To find out, Copyleaks conducted an analysis to determine the degree to which AI-generated content is original and free of potential plagiarism.

Number of Papers Tested for Each Subject



To conduct this analysis:

We asked GPT-3.5 to write 1,045 outputs, averaging 412 words across all outputs, in 26 subjects.


59.7% of GPT-3.5 Outputs Contained Some Form of Plagiarized Content


Physics:
Chemistry:
Science:
Psychology:
Law:
Economics:
Biology:
Business Studies:
Engineering:
Accounting:
Geography:
Mathematics:
Computer Science:
Sports:
World History:
Philosophy:
English Language:
Art:
Physical Education:
Statistics:
Social Science:
Nature:
Music:
Sociology:
Humanities:
Theater:

83.7%
68.0%
67.3%
63.3%
57.5%
57.1%
55.1%
51.4%
51.4%
50.0%
49.0%
49.0%
47.5%
42.1%
39.6%
37.5%
37.1%
35.0%
35.0%
32.5%
28.6%
25.0%
22.9%
22.9%
15.0%
14.3%


Mathematics:
Physics:
Psychology:
Science:
Biology:
Chemistry:
Economics:
Business Studies:
Computer Science:
Law:
Statistics:
Physical Education:
Sports:
Accounting:
Art:
Engineering:
Philosophy:
Geography:
Nature:
World History:
Sociology:
English Language:
Social Science:
Music:
Theater:
Humanities:

67.4%
57.1%
53.1%
51.0%
49.0%
46.0%
38.8%
37.1%
35.0%
30.0%
30.0%
22.5%
21.1%
20.0%
20.0%
20.0%
17.5%
16.3%
15.0%
12.5%
11.4%
8.6%
8.6%
5.7%
5.7%
0.0%


Physics:
Psychology:
Chemistry:
Science:
Biology:
Computer Science:
Economics:
Business Studies:
Mathematics:
Philosophy:
Statistics:
Sports:
World History:
Accounting:
Law:
Nature:
Physical Education:
Art:
Engineering:
Geography:
Sociology:
English Language:
Music:
Social Science:
Humanities:
Theater

79.6%
79.6%
66.0%
65.3%
63.3%
62.5%
59.2%
57.1%
49.0%
47.5%
47.5%
47.4%
45.8%
42.5%
42.5%
40.0%
40.0%
35.0%
34.3%
32.7%
31.4%
28.6%
25.7%
20.0%
15.0%
5.7%


*Identical Text: A one-for-one copying of someone else’s text that is passed off as your own

**Minor Changes: Content with minor alterations to the source material, such as altering a verb within a sentence (e.g., slow to slowly)

***Paraphrased Text: Putting someone else’s idea into your own words without crediting the original source


Copyleaks then conducted an in-depth analysis to gauge the specific outputs with the highest levels of identical text, minor changes, and paraphrasing across all 26 subjects.

Identical Text

Our analysis found that the individual GPT-3.5 output with the highest percentage of plagiarism was in Physics, where 27.0% of the text was identical. This was followed by an individual Chemistry output where 24.7% of the text was identical.

Outputs With the Highest Percentages of Identical Text for Each Subject


Minor Changes

The individual GPT-3.5 outputs with the highest percentages of minor changes were from Physics and Psychology, where 25.2% of each respective output contained minor changes.

Outputs With the Highest Percentages of Minor Changes for Each Subject


Paraphrased

The Individual GPT-3.5 output with the highest percentage of paraphrasing was in Computer Science, where a surprising 80.7% of the text was paraphrased. This was followed by an indiviudal Physics output where 76.3% of the text was paraphrased.

Outputs With the Highest Percentage of Paraphrasing for Each Subject


Similarity Score

The Similarity Score is a Copyleaks-specific scoring method aggregating the rate of identical text, minor changes, paraphrased text, and more. A score of 0% signifies that all of the content is original, whereas a score of 100% means that none of the content is original.

Subjects With the Highest and Lowest Average Similarity Scores

The subject with the highest average Similarity Score is Physics at 31.3%, followed closely by Psychology at 27.7% and Science at 26.7%. The subjects with the lowest average Similarity Score are Theater at 0.9%, Humanities at 2.8%, and English Language at 5.4%.


Outputs With the Highest Similarity Score for Each Subject

Across all subjects, our analysis found that the individual GPT-3.5 output with the highest Similarity Score was in Computer Science, with an astounding 100%, followed by Physics with 92% and Psychology with 88%.


Key Takeaways

With AI-generated content expanding and continuing to saturate the internet, having key solutions in place is critical. As the Copyleaks data shows, nearly 60% of AI-generated content contains some form of plagiarism. 

The insights provided by the analysis can help educational institutions and organizations put emphasis on certain subjects when checking for plagiarism, allowing them to tailor their approach as needed to ensure all potential risks and concerns are addressed. For example; Physics, Chemistry, Mathematics, and Psychology might require a more in-depth look to identify plagiarized text, while other subjects, including Theater and Humanities, may require less scrutiny.

Furthermore, the data underscores the need for organizations to adopt solutions that detect the presence of AI-generated content and provide the necessary transparency surrounding potential plagiarism within the AI content. Full-spectrum protection that includes AI and plagiarism detection ensures compliance with copyright and licensing and empowers authenticity and originality within all content.