We believe it is more important than ever to be fully transparent about the AI Image Detector’s accuracy, including rates of false positives and false negatives, as well as areas for improvement, to ensure responsible use and adoption. This comprehensive analysis aims to provide complete transparency around our AI Image Detector V1.1 model testing methodology.
Our model is designed to detect AI-manipulated portions of an image by producing an overlay of the detected areas. Testing verifies that the AI Image Detector achieves high detection accuracy in distinguishing between authentic human photos and AI-generated or AI-manipulated images, while maintaining an extremely low false positive rate.
Test date: February 1, 2026
Publish date: February 15, 2026
Model tested: V1.1
Using a dual-team system, we have designed our evaluation process to ensure top-level quality, standards, and reliability. We have two independent departments evaluating the model: the Data Science and the QA teams. Each team works independently with its own evaluation data and tools and does not have access to the other team’s evaluation process. This separation ensures the evaluation results are unbiased, objective, and accurate. It is also essential to note that all testing data is strictly separate from the training data; we test our models only on new images that have never interacted with our AI Image Detector. To ensure our testing remains relevant and challenging, we continuously update our evaluation datasets to include images generated by the latest GenAI models.
For every model release, the Copyleaks QA and Data Science teams independently gather and create a variety of testing datasets. Each dataset consists of a finite number of images with an expected label indicating its origin. The datasets are divided into two categories:The Copyleaks QA and Data Science teams independently gathered and created a variety of testing datasets. Each dataset consists of a finite number of images with an expected label indicating its origin. The datasets are divided into two categories:
AI-generated images were created using a wide variety of generative AI models. The tests were executed against the Copyleaks API, and we aggregated the scores to calculate the model’s performance.
The evaluation was conducted exclusively on images that meet these technical requirements: a minimum dimension of 512×512 pixels, a file size under 32 MB, and a resolution under 16 megapixels, as defined in the documentation.
The product makes a prediction in the form of an overlay of the AI-generated segments. The overall performance is then evaluated based on how accurately the model classifies the images, according to their ground-truth category.
To provide a clear and robust measure of accuracy, we use different pixel-level metrics depending on the type of image being tested:
The overall accuracy figures presented in the Results tables, such as TNR (Human Accuracy) and TPR (AI), are aggregated from these pixel-level success criteria. For example, the TNR is the percentage of all tested human images that successfully met the <5% false positive pixel threshold.
The Data Science team conducted the following independent test on a large, diverse dataset containing images of varying resolutions, capturing devices, image generators, and content types.
| Dataset's name | Human Images (n=31,374) | AI Images (n=33,947) |
|---|---|---|
| Accuracy | 98.6% | 97.6% |
The QA team conducted an independent test using images created explicitly for evaluation after the model was trained. The test dataset comprises images of varying resolutions, captured by different devices, generated by various image generators, and featuring diverse content types.
| Dataset's name | Human Images (n=10,000) | AI Images (n=10,000) |
|---|---|---|
| Accuracy | 99.3% | 98% |
During the evaluation process, we identify and analyze incorrect assessments to enable the data science team to correct the underlying causes. All errors are systematically logged and categorized based on their character and nature in a “root cause analysis process.” This process aims to understand the underlying causes of errors and identify repeated patterns, ensuring the ongoing improvement and adaptability of our model. These insights are used to refine future versions of the model.
While our model achieves state-of-the-art results, no detection system is perfect, and our model can make mistakes, such as misclassifying a specific pixel set.
The AI Image Detector is specifically trained to identify manipulations from the latest generative AI tools. The system does not currently detect other common image alterations, including: