AI Text Detection as a Signal, Not a Verdict: Accuracy Limitations and Ethical Implications

Author: Fathin Difa Robbani
Editor: Ayom Mratita Purbandani
Introduction
The use of generative artificial intelligence (genAI) in academic and professional writing has grown rapidly. Large language models are now employed not only for language correction or paraphrasing, but also for argument construction, literature summarization, and drafting early versions of scholarly work. Several studies have even shown gains in productivity, and, under certain conditions, improvements in writing quality across higher education and knowledge-based professions.[1][2] As a result, many academic institutions have raised concerns about issues of originality and academic integrity. In response to these concerns, institutions have increasingly adopted AI text detection tools to distinguish between texts produced by humans and those generated by AI, in some cases using these tools as a basis for decision-making that directly affects academic evaluation and sanctions
Problems arise when these tools are treated as if they were objective and reliable, despite empirical evidence showing that their accuracy remains limited—often in the range of 50–70%—declines with newer generative models, and exhibits bias against non-native writers or texts that have undergone human editing.[3] Overreliance on such detectors under these conditions poses significant ethical and social risks. Therefore, this paper argues that AI text detection tools should not serve as the sole basis for decision-making; rather, they should be treated as probabilistic, preliminary indicators. Without a clear ethical and regulatory framework, their use risks creating new forms of injustice instead of safeguarding academic and professional integrity.
Why AI Text Detection Is Principally Difficult
AI text detection operates on a probabilistic rather than a deterministic basis. Detection tools cannot ascertain the true origin of a text; instead, they estimate the likelihood that a given text was generated by a human or by a language model. These estimates are typically based on metrics such as perplexity, patterns of linguistic variation (stylometry), and classification models trained on samples of human- and AI-generated text.[4][5] Such approaches are inherently uncertain and rely heavily on the assumption that stable and distinguishable characteristics exist between human-written and AI-generated text.
This assumption becomes increasingly fragile as generative language models continue to advance. Newer models produce text that is more diverse, context-aware, and stylistically similar to human writing, while human authors themselves increasingly edit or modify texts with the assistance of AI. As a result, the boundary between human and AI text becomes blurred, and the statistical patterns relied upon by detectors tend to become obsolete quickly and are difficult to generalize to real-world conditions. A study by Weber-Wulff et al.[6] shows that most AI text detection tools achieve accuracies of only around 50–70 percent, with performance declining significantly when tested on texts generated by more recent models such as GPT-4
On the other hand, reports of high detection accuracy in specific contexts must be interpreted with caution. Pitriani et al.[7], for example, demonstrate strong detection performance using IndoBERT for Indonesian-language texts. However, these results were obtained from controlled datasets with restricted domains and narrowly defined usage scenarios. Such conditions do not fully reflect the complexity of real-world settings, where texts are often hybrid, edited, and shaped by the social context of the writer. Consequently, high accuracy figures in laboratory evaluations do not automatically translate into social validity or reliable performance of AI text detectors in practical use.
Real Risks: Errors, Bias, and Hybrid Texts
The technical limitations of AI text detection have direct consequences in academic and professional practice, primarily through two types of errors: false positives and false negatives. False positives—cases in which human-written text is classified as AI-generated—can lead to academic sanctions, reputational damage, and psychological distress, whereas false negatives—cases in which AI-generated text is accepted as human-written—often carry few immediate consequences. In this context, the cost of error is asymmetrical and ethically weighs more heavily on individuals than on institutions, rendering the use of detection tools as a decision-making basis deeply problematic.
These risks are further exacerbated by bias against non-native writers. Linguistic features such as simpler phrasing, more direct sentence structures, or limited lexical variation are frequently misinterpreted as indicators of AI-generated text, even though they reflect the writer’s linguistic background rather than dishonest academic behavior. Weber-Wulff et al.[8] explicitly demonstrate a tendency for detection tools to classify texts written by ESL (English as a Second Language) authors as AI-generated, raising the risk of language-based discrimination rather than the fair enforcement of academic integrity. In global and multilingual contexts, including Indonesian-language settings, such bias carries increasingly serious ethical implications.
Moreover, the assumption that texts can be classified in a binary manner—either human-written or AI-generated—is becoming increasingly untenable. Contemporary writing practices often involve human–AI collaboration, whether through the use of AI for initial drafting or for subsequent editing and refinement. Research such as DAMASHA[9] shows that hybrid texts and adversarial manipulation significantly increase detection complexity, underscoring the fact that real-world writing practices extend far beyond simplistic classification assumptions. Taken together, the prevalence of errors, bias, and hybrid texts demonstrates that binary human-versus-AI detection frameworks are no longer adequate and risk amplifying injustice when used as a basis for consequential decisions.
Ethical and Regulatory Implications
The use of AI text detection tools raises ethical and regulatory concerns that extend beyond purely technical considerations. One of the central issues is the lack of transparency. Many institutions fail to clearly explain how detection systems operate, their accuracy levels, or their inherent limitations. As a result, detection outputs are often treated as objective facts rather than as probabilistic estimates. This practice undermines accountability and creates room for disproportionate decision-making.
Another critical ethical concern is the shifting burden of proof. In many cases, when a detector flags a text as AI-generated, the author is required to prove that their work is original. This approach is problematic because it presumes fault on the part of the writer rather than acknowledging the documented error rates of the detection tools themselves. Moreover, AI text detection tools are frequently treated as if they were forensic instruments, despite the fact that they are not scientifically designed to provide certainty, but only indications.
For these reasons, a set of minimum principles is needed to guide the use of AI text detection. First, detection results should not be used as the sole evidence in decisions with significant consequences. Second, human evaluation must be an integral part of the process, particularly to account for writing context and the author’s background. Third, authors should have the right to seek clarification and to contest detection results, including access to a general explanation of the criteria underlying the assessment.
This approach does not aim to prohibit the use of AI text detection, but rather to constrain and frame it responsibly. Without a clear ethical and regulatory framework, detection tools risk becoming instruments of unfair control. When applied cautiously and transparently, however, they can function as supportive tools that help uphold integrity without sacrificing fairness.
Conclusion
AI text detection tools still suffer from fundamental limitations that render them insufficiently reliable as a basis for high-stakes decisions. Their probabilistic nature, unstable performance against newer generative models, and the risks of false positives, false negatives, and bias against non-native writers all indicate that detection outputs should be understood as preliminary signals rather than final decision-making instruments. When detectors are treated as sole evidence, the risk of injustice increases, manifesting in erroneous academic sanctions and the erosion of trust between institutions and authors. For these reasons, the use of AI text detection must be ethically framed through human evaluation, transparency regarding tool limitations, and fair mechanisms for clarification and appeal.
Looking ahead, safeguarding academic integrity cannot rely solely on detection technologies. Instead, it requires improved AI literacy, clear guidelines for responsible use, and writing ethics that are adaptive to human–AI collaboration. Academic integrity is ultimately not about policing technology, but about cultivating fair, reflective, and context-aware scholarly practices.

  1. Usdan, J. et al. (2024) ‘Generative AI’s Impact on Graduate Student Professional Writing Productivity and Quality’, International Journal of Artificial Intelligence in Education. Available at: https://link.springer.com/article/10.1007/s40593-025-00528-z
  2. Forster, Rebecca Tukachinsky, et al.(2025) ‘From digital divide to equity-enhancing diffusion: Generative AI and writing quality’, AI & Society. Available at: https://link.springer.com/article/10.1007/s00146-025-02739-3
  3. Weber‑Wulff, D., Anohina‑Naumeca, A., Bjelobaba, S., Foltýnek, T., Guerrero‑Dib, J., Popoola, O., & Šigut, P. (2023). Testing of detection tools for AI‑generated text. International Journal for Educational Integrity, 19, 26. doi:10.1007/s40979-023-00146-z. Available at: https://link.springer.com/article/10.1007/s40979-023-00146-z
  4. Hayawi, K., Shahriar, S. and Mathew, S.S., 2023. The Imitation Game: Detecting Human and AI-Generated Texts in the Era of ChatGPT and BARD. arXiv. Available at: https://arxiv.org/abs/2307.12166
  5. Opara, C., 2025. Distinguishing AI-Generated and Human-Written Text Through Psycholinguistic Analysis. arXiv. Available at: https://arxiv.org/abs/2505.01800
  6. Weber‑Wulff, D., Anohina‑Naumeca, A., Bjelobaba, S., Foltýnek, T., Guerrero‑Dib, J., Popoola, O., & Šigut, P. (2023). Testing of detection tools for AI‑generated text. International Journal for Educational Integrity, 19, 26. doi:10.1007/s40979-023-00146-z. Available at: https://link.springer.com/article/10.1007/s40979-023-00146-z
  7. Pitriani, P., Maylawati, D.S. & Gerhana, Y.A. (2023) ‘Deteksi Generatif Teks pada Penilaian Otomatis Tes Esai Berbahasa Indonesia Menggunakan IndoBERT’, Jurnal Edukasi dan Penelitian Informatika (JEPIN), vol. 11, no. 2, pp. 170–190. Available at: https://jurnal.untan.ac.id/index.php/jepin/article/view/93221
  8. Weber‑Wulff, D., Anohina‑Naumeca, A., Bjelobaba, S., Foltýnek, T., Guerrero‑Dib, J., Popoola, O., & Šigut, P. (2023). Testing of detection tools for AI‑generated text. International Journal for Educational Integrity, 19, 26. doi:10.1007/s40979-023-00146-z. Available at: https://link.springer.com/article/10.1007/s40979-023-00146-z
  9. Sai Teja, L.D.M.S., Gopala Krishna, N.S., Khan, U., Khan, M.H., Pakray, P. & Mishra, A. (2025) DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution. arXiv preprint. Available at: https://arxiv.org/abs/2512.04838