AI-generated answers achieve higher grades and evade detection in university exams
- AI-generated answers scored higher than real students in undergraduate exams.
- 94% of AI essays went undetected by markers.
- Study suggests current AI technology struggles with abstract reasoning in advanced coursework.
A recent study conducted by the University of Reading has revealed that artificial intelligence (AI) can outperform real students in university exams, raising significant concerns about academic integrity and the future of educational assessments. The study used the AI tool ChatGPT to generate exam answers, which were then compared against those of real students.
AI vs. Human Performance
The researchers created 33 fictitious students and had AI generate answers to undergraduate psychology exams. On average, the AI-generated responses scored half a grade boundary higher than those submitted by actual students. Additionally, the study found that 94% of AI essays went undetected by the markers, indicating that the essays appeared sufficiently authentic to pass for human work. The study’s findings were published in the journal Plos One.
“This is particularly worrying as AI submissions robustly gained higher grades than real student submissions,” the study noted. “Thus, students could cheat undetected using AI – and in doing so, attain a better grade than those who did not cheat.”
Implications for Education
Associate Professor Peter Scarfe and Professor Etienne Roesch, who led the study, emphasized the urgent need for the global education sector to address the challenges posed by AI. Dr. Scarfe highlighted that many institutions have moved away from traditional exams to create more inclusive assessments, but these changes now face a new threat from AI.
“Our research shows it is of international importance to understand how AI will affect the integrity of educational assessments,” Dr. Scarfe said. “We won’t necessarily go back fully to handwritten exams – but the global education sector will need to evolve in the face of AI.”
Abstract Reasoning and AI Limitations
The study also revealed that while AI performed well in first- and second-year exams, it struggled with third-year exams that required more abstract reasoning. This suggests that current AI technology has limitations when it comes to more complex cognitive tasks.
Broader Concerns
This study adds to growing concerns within academia about the impact of AI on education. For instance, Glasgow University recently reintroduced in-person exams for one course in response to similar concerns. Earlier this year, a Guardian report highlighted that most undergraduates admitted to using AI programs to assist with their essays, though only 5% admitted to submitting unedited AI-generated text.
The University of Reading’s study serves as a wake-up call for educators worldwide. It underscores the need for new strategies and technologies to preserve the integrity of educational assessments in the age of AI. As AI continues to evolve, the education sector must adapt to ensure that academic standards are upheld and that the value of human learning and critical thinking remains paramount.