Copyleaks शोध में पाया गया कि लगभग 60% GPT-3.5 आउटपुट में किसी न किसी रूप में साहित्यिक चोरी की गई सामग्री शामिल थी

New York, NY – February 22, 2024 According to a 2023 प्रतिवेदन2026 तक, सभी ऑनलाइन सामग्री का लगभग 90% AI द्वारा जनित होगा। AI सामग्री संतृप्ति के परिणामस्वरूप, डेटा प्रदूषण और अपरिहार्य के बारे में चिंताएँ मॉडल पतन raise concerns about AI-generated text’s overall quality and reliability.


इसके अलावा, मौलिकता को लेकर व्यापक चिंताएं भी शुरू हो गई हैं। कई मुकदमे regarding AI infringing on copyright and potentially plagiarizing, educational institutions and enterprises across the globe are questioning the authenticity of AI text: Where did it originate from? Is it safe to use as original content


अंततः, क्या AI साहित्यिक चोरी करता है?


To find out, Copyleaks, the leader in plagiarism identification, AI-content detection, and GenAI governance, conducted an analysis to determine the degree to which AI-generated content is original and free of potential plagiarism.


इस विश्लेषण को करने के लिए:


Copyleaks asked GPT-3.5 to write 1,045 आउटपुट, औसत 412 शब्द सभी आउटपुट में, 26 subjects: Physics, Chemistry, Science, Psychology, Law, Economics, Biology, Business Studies, Engineering, Accounting, Geography, Mathematics, Computer Science, Sports, World History, Philosophy, English Language, Art, Physical Education, Statistics, Social Science, Nature, Music, Sociology, Humanities, Theater. 


Copyleaks gauged the specific outputs with the highest levels of identical text (A one-for-one copying of someone else’s text that is passed off as your own), minor changes (Content with minor alterations to the source material, such as altering a verb within a sentence (e.g., slow to slowly), and paraphrasing (Putting someone else’s idea into your own words without crediting the original source) across all 26 subjects. 


Key findings from the analysis include:


  • 59.7% of GPT-3.5 outputs contained some form of plagiarized content. 45.7% of all outputs contained identical text, 27.4% contained minor changes,  and 46.5% had paraphrased text. This also highlights that GPT-3.5 isn’t manufacturing “brand new” text; most of the content hails from a previous source, raising issues around plagiarism, copyright, and intellectual property.


  • The individual GPT-3.5 output with the highest percentage of plagiarism was in Physics, where 27.0% of the text was identical. This was followed by an individual Chemistry output where 24.7% of the text was identical.


  • The analysis also examined Similarity Scores. The Similarity Score is a Copyleaks-specific scoring method aggregating the rate of identical text, minor changes, paraphrased text, and more. A score of 0% signifies that all of the content is original, whereas a score of 100% means that none of the content is original.
  • The subject with the highest average Similarity Score was Physics at 31.3%, followed closely by Psychology at 27.7% and Science at 26.7%. The subjects with the lowest average Similarity Score were Theater at 0.9%, Humanities at 2.8%, and English Language at 5.4%.


The insights provided by the analysis can help educational institutions and organizations put emphasis on certain subjects when checking for plagiarism, allowing them to tailor their approach as needed to ensure all potential risks and concerns are addressed,” said Alon Yamin, CEO and Co-founder of Copyleaks. “For example, Physics, Chemistry, Mathematics, and Psychology might require a more in-depth look to identify plagiarized text, while other subjects, including Theater and Humanities, may require less scrutiny.”


Yamin added: “Furthermore, the data underscores the need for organizations to adopt solutions that detect the presence of AI-generated content and provide the necessary transparency surrounding potential plagiarism within the AI content. Full-spectrum protection that includes AI and plagiarism detection ensures compliance with copyright and licensing and empowers authenticity and originality within all content.”


Copyleaks के बारे में

विचारों को साझा करने और आत्मविश्वास से सीखने के लिए सुरक्षित वातावरण बनाने के लिए समर्पित, Copyleaks एक AI-आधारित पाठ विश्लेषण कंपनी है जिसका उपयोग व्यवसायों, शैक्षणिक संस्थानों और दुनिया भर के लाखों व्यक्तियों द्वारा 100 से अधिक भाषाओं में संभावित साहित्यिक चोरी की पहचान करने, AI-जनरेटेड सामग्री को उजागर करने, जिम्मेदार जनरेटिव AI अपनाने को सुनिश्चित करने, प्रामाणिकता और स्वामित्व को सत्यापित करने और त्रुटि-मुक्त लेखन को सशक्त बनाने के लिए किया जाता है।

अतिरिक्त जानकारी के लिए, हमारी वेबसाइट पर जाएँ वेबसाइट या हमें फ़ॉलो करें लिंक्डइन.