Why voice assistants will be this autumn's new AI battleground

Updated
Jenny Blackburn, Vice President of UX, Gemini experiences and Google Assistant presents different voices for Gemini during Made By Google at Google on Tuesday, Aug. 13, 2024, in Mountain View, Calif. (AP Photo/Juliana Yamada)
Google's Gemini Live will be able to hold conversations like a human being (AP Photo/Juliana Yamada) (ASSOCIATED PRESS)

The next phase of the artificial intelligence (AI) arms race looks set to centre on spoken-word assistants.

Google launched its Gemini Live voice service last week, which allows Android users to have live voice chats with Google’s Gemini AI chatbot, with "human-like" conversations in which people can interrupt and answer follow-up questions.

OpenAi’s ChatGPT already has its similar Advanced Voice Mode, set to fully launch this autumn - and both Apple’s Siri and Amazon’s Alexa are set for an upgrade with generative AI.

So what will be different about this new generation of voice assistants? And why is voice suddenly so important to delivering AI?

The big difference with ChatGPT’s Advanced Voice Mode and Gemini Live is that the assistants can hold ‘flowing’ conversations, something that Siri, Alexa and Google Assistant can’t.'

Instead, the newly released voice assistants can reply, remember what they are talking about, and fetch more information as needed.

They are also capable of finding information online or within other apps - and work like a ‘digital butler’ which can expand on topics and even perform actions within other apps.

With Gemini Ultra, Google promises that you can hold a flowing conversation, change topics and even interrupt Gemini Live and it will continue the conversation: Google hopes that it will become more like a ‘sidekick’ than a simple voice app.

It also is planning on its ‘killer app’ (and the reason people will pay the monthly subscription) being able to integrate with other Google apps such as Gmail, Calendar and Tasks.

Users (speaking via either headsets such as Google’s Pixel Buds 2, or direct to the phone) will be able to find invitations or meeting details and query the assistant directly. The extensions to enable users to access these apps will be added soon, Google promises.

Google hopes that the assistant will become a normal part of using apps on an Android phone. An ‘ask about this screen’ item will allow users to pull up more information on a location in Maps or even a YouTube video.

In a blog post, Google wrote: "Let’s say you’re preparing for a trip abroad and have just watched a travel vlog - tap “Ask about this video” and ask for a list of all the restaurants mentioned in the video - and for Gemini to add them to Google Maps."

Likewise, OpenAI’s Advanced Voice Mode can conduct back-and-forth conversations and human-like interactions (it can even sing and do impressions, although the musical ability has been removed from the current version over copyright fears).

OpenAI CEO Sam Altman speaks during the Microsoft Build conference at Microsoft headquarters in Redmond, Washington, on May 21, 2024. (Photo by Jason Redmond / AFP) (Photo by JASON REDMOND/AFP via Getty Images)
ChatGPT's Advanced Voice Asssistant is currently in testing (Photo by Jason Redmond / AFP) (Photo by JASON REDMOND/AFP via Getty Images) (JASON REDMOND via Getty Images)

Advanced Voice Mode launched to a small group of invited subscribers in July, with OpenAI promising a full roll-out to paying ChatGPT subscribers this autumn.

OpenAI wrote in a blog post: "We are planning for all Plus users to have access in the fall. Exact timelines depend on meeting our high safety and reliability bar.’

To try Gemini Live, you need to be an Android user with a subscription to Gemini Advanced ($20 per month) as part of the Google One AI Premium Plan. To use Advanced Voice Mode on ChatGPT, you’ll need an invitation to the alpha test.

Apple’s upcoming iOS 18, currently in beta test, will launch this autumn alongside the new iPhones at an event usually scheduled in early September - and will feature a new focus on Al.

The company brands this as ‘Apple Intelligence’ and part of this will be a newly upgraded version of Siri, powered by generative AI, which Apple promises will be more tightly integrated into iOS.

Apple wrote earlier this year: "With richer language-understanding capabilities, Siri is more natural, more contextually relevant, and more personal, with the ability to simplify and accelerate everyday tasks. It can follow along if users stumble over words and maintain context from one request to the next."

Amazon is also said to be planning a new, upgraded version of its Alexa assistant for launch this autumn, powered by generative AI.

CNBC has reported that sources within Amazon suggested that the new chatbot would be positioned to compete with Google and OpenAI’s offerings, and would require a monthly subscription, and would not be included in Amazon Prime.

Advertisement