AI exam answers ‘virtually undetectable’ by university examiners

AI written exams 'virtually undetectable' by university examiners, study reveals
AI written exams 'virtually undetectable' by university examiners, study reveals

University exams answers written by AI are “virtually undetectable” when assessed by human markers, a study has found.

In the first “real-world” analysis of AI use in official exams, researchers at the University of Reading found that 94 per cent of assessments written by chatbots went unspotted.

The study found that fake answers written by AI also earned higher grades than those written by real students. In 83 per cent of cases, the AI submissions received better marks than their human peers.

Prof Peter Scarfe, the lead author of the report, told The Telegraph the results should serve as a “wake-up call” to universities that “AI is a problem now, it’s not a hypothetical future problem”.

It comes amid widespread concerns that students are using AI to cheat at scale, with current checking systems inadequate to spot malpractice. It has prompted calls by some to overhaul assessments such as coursework and essays and usher in a complete return to in-person exams. Others have urged universities to embrace AI in an ethical way.

First-class grades

For the study, completed last summer and published in the journal Plos One on Wednesday, Prof Scarfe and his team created 33 fake students, using ChatGPT, the AI chatbot, who then used the software to answer official “at-home exams” as part of the university’s BSc degree in psychology.

The team submitted the AI-written assessment answers under the false identities to Reading’s official exam markers, alongside those written by real students. Markers were unaware of the study, as were the real students, though the team received the green light for the experiment from the university.

Submissions were awarded grades by an initial marker on the university’s official assessment team, which were then moderated by an independent moderator.

On average, the AI-written answers were just over half a grade boundary higher than those submitted by real students.

In one module, answers written by ChatGPT achieved first-class grades – making them a whole grade boundary higher than exams written by the real Reading students.

Modules included a mixture of short answer questions, which required 200-word answers, and essay-based questions, where students were asked to write 1,500 words on topics such as the design constraints of revolving doors.

Researchers copied and pasted the answers produced by ChatGPT completely unedited into the university’s online submission platform.

They said the results showed that “AI renders unsupervised assessments, branded ‘authentic’ or not, dangerously susceptible to academic misconduct”.

“From a perspective of academic integrity, 100 per cent AI-written exam submissions being virtually undetectable is extremely concerning,” they added.

Prof Scarfe, an associate professor at Reading’s School of Psychology and Clinical Language Sciences, said the results suggested that “the current systems for detecting AI will have to change”.

“I do think it’s a wake-up call for the sector,” he told The Telegraph. “I think the global educational sector will have to change, and this shows that detectability is probably not going to work.”

While students still sit in-person exams at most universities, the steady creep of coursework over the past few decades rocketed when “take-home exams” became commonplace during the Covid pandemic.

‘The new normal’

The sector has been slow to shake off remote learning and assessments in the wake of the pandemic.

At the same time, chatbots have exploded in popularity as they become increasingly sophisticated. ChatGPT is a form of generative AI that can respond to questions in a human-like manner in a matter of seconds.

A survey of 1,250 students by the Ucas admissions service in January found that 53 per cent have used generative AI to help prepare for exams, while 5 per cent have incorporated it unedited into their assessments.

Despite this, AI detection software has been largely unsuccessful, leaving it to real-life markers to act as the main safety net in spotting students’ use of chatbots.

Experts have warned that the relatively low rate of “false positives” produced by AI detection software means universities are unwilling to risk accusing students of cheating.

It has prompted universities to rapidly rewrite their policies to spell out what constitutes cheating in the absence of a watertight solution to AI use.

Last year the Russell Group, including Oxford, Cambridge, University College London and other top universities, pledged to allow ethical use of AI in teaching and assessments.

Reading University said its study showed it was prepared to turn “the microscope on ourselves to lead in this”.

Prof Elizabeth McCrum, pro-vice-chancellor for education at the university, said it should prompt universities to “focus on working out how to embrace the ‘new normal’ of AI in order to enhance education”.

Experts have suggested ChatGPT could help students save time on usually laborious bibliographies, acknowledgements and references. Others have said it could be used as a launchpad for more interactive assessments such as presentations, and could help non-native English speakers write essays.

Advertisement