OpenAI’s Voice Engine Can Clone Human Voices From a 15-Second Sample

OpenAI has state-of-the-art models for text and image generation, and most recently, it also introducedSora, an incredible text-to-video model. Now, the company has announced a Voice Engine model that can generate speeches with asingle 15-second audio sample. It’s essentially a text-to-audio model where you feed a 15-second audio to train the model and input your text to generate natural-sounding speech.

OpenAI says that even though the model is small, Voice Engine can generaterealistic and emotive voices, very close to the original speaker. According to the company, the model was created in late 2022 and has been powering theChatGPT Voice Chatfeature.

OpenAI acknowledges the “serious risks” associated with the technology and the “potential for synthetic voice misuse“. So the company is not releasing the model to the public at this time, instead, it’s previewing the model to start a discourse around voice synthesis and how the society can adapt to these new capabilities.

As for the model, it can translate realistic audio in different languages with a nuanced accent.HeyGen, a popularAI video and audio generationplatform, has been using OpenAI’s Voice Engine to create custom voices. In this space, ElevenLabs has built its own speech synthesis model that canclone voiceand generate speeches in multilingual languages.

While the technology is quite powerful, it can be deceptive and may imperil users in various situations. OpenAI admits thatvoice-based authenticationis used for accessing bank accounts and other sensitive information. The company hopes that such authentication systems are phased out. Apart from that, social media is filled with people cloning popular voices to upsell their products.Another ad using@MKBHD’s voicepic.twitter.com/9z2c0ifYxg— Max Weinbach (@MaxWinebach)March 29, 2024

Another ad using@MKBHD’s voicepic.twitter.com/9z2c0ifYxg— Max Weinbach (@MaxWinebach)March 29, 2024

In India, particularly, AIvoice cloning scamsare on the rise. Cybercriminals arecloning kids’ voicesto threaten parents and extort money. In such a scenario, OpenAI is not well-positioned to release the model widely. As we move towards the AI era, more caution and resilience are needed from society at large.

What do you think about OpenAI’s voice cloning engine? Should the company release the model to the public? Let us know your thoughts in the comments below.

Arjun Sha

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.

Add new comment

Name

Email ID

Δ

01

02

03

04

05