Listen: Can you tell an AI voice from a real human?

Can you tell an AI voice from a real human?
Can you tell an AI voice from a real human?

People are unable to tell if a voice speaking to them is that of a real person or an AI clone, scientists have found.

AI audio cloning is now so advanced that it is able to create entire paragraphs from small snippets of recordings and make a voice that is indistinguishable from a real human.

Experts fear realistic computer voices could be used by scammers imitating banks and spread fake news.

Researchers at University College London played audio clips out loud to 100 people and found they could not which was human and which was created using artificial intelligence (AI).

People were played a phrase twice – read by a real person, and by an AI clone of that person – and asked which they thought was authentic.

Click here to view this content.

Click here to view this content.

Participants only answered correctly 48 per cent of the time.

People were better at recognising AI when it was impersonating someone they knew, with correct identification 88 per cent of the time when it was the voice of a friend.

Prof Carolyn McGettigan, the study’s author and chairman of speech and hearing sciences at UCL, presented her findings at the British Science Festival ahead of publication in a scientific journal.

“What we’ve found is that for people who know the original voice, they are actually quite sensitive to whether what they’re hearing is a clone or an authentic recording. But when it comes to a stranger’s voice, they’re basically guessing,” Prof McGettigan said.

“What we’re seeing now is that the technology is good enough to mean that listeners may be unable to tell if what they’re listening to is the voice of a real person or not.”

Ad-hoc experiment

A recording of Aesop’s fable, The North Wind and the Sun, was played aloud to journalists at the festival. They were asked to say if the audio clip was real or fake.

Every person in the ad-hoc experiment believed the recording to be genuine, but Prof McGettigan revealed it was a chimera, with both AI and human voices interwoven in one clip.

“You were hard-pressed to think that this was a computer-generated voice in any part, and you probably wouldn’t have thought there were two different sources of speech in there,” Prof McGettigan said.

“Synthetic voices can sound very, very human-like. You were all pretty convinced that it was human all the way through.”

The technology is now so readily available and competent that people and companies are contemplating the idea of allowing people to use AI clones of a specific voice for smart assistants like Siri and Alexa, or to read audiobooks to individuals.

Ethical questions

Prof McGettigan adds that there are serious ethical questions on how to deploy and regulate this technology and also how to protect people from deception. There is also the possibility the technology could be used to recreate the voice of a deceased loved one, in the style of the sci-fi TV show Black Mirror.

“This has serious ethical implications that we all need to consider – the technology already exists, so it’s up to us to decide how we best make use of it,” Prof McGettigan said.

“I think it is realistic to say that any kind of technology will always be apt to be abused, regardless of its benefits.

“It seems like there are probably ways in which, as a whole society, we need to think about the ways in which we evaluate information.

“I think there are lots of possibilities for harm in these kinds of technologies that might seek to replicate a person’s identity if they were used for nefarious purposes, but I suppose the question is to what extent would minimising the harms also interrupt the potential benefit.”

Advertisement