How come ai synthesized voices sound so bad?

229 views

As you’ve seen in the past year with AI, it’s growing extremely quickly. From DALL-E 2 and Stable Defusion that can create insanely photo realistic images to ChatGPT which can have fairly natural conversations, understand your questions, and answer them correctly, all of these AIs are growing at an alarming rate. How come AI voices like Google Assistant and Alexa —voices that people interact with every day— still sound so artificial and unnatural?

In: 0

3 Answers

Anonymous 0 Comments

The reality is, They’re not bad it’s just really hard to get a natural sound and inflection to match a humans pattern a hundred percent of the time

Vocaloid, Voice composition and the such work by having a bank of voices(vocals, consonants, dipthongs etc.) And concatenating the sound, this gives the pronunciation

Thehard part is the tone and union of sounds, due to how mouths work, the sound hello is three/four parts, He- ell- llo-ouuu, and all four need to go into eachother in a very specific order, and specific tone for it to work

Your brain learns this order, tone and inflection since birth, and you have a physical mouth to try and replicate it

Machines do not, they need someone to tell them how to do it, and it is hard to tell a computer how to do human things in the general cases

However when it’s the case for vocaloids/sing-a-bots, they get the exact tone, breath response etc. From a human constantly listening and altering the tone, inflection and overall result tomake it sound more natural cause that’s how you get perfect singing bots

However, these Waves are in perfect harmony, and human voice is more like a mishmash of tones that scraggle and twist, so it’s very hard to do a natural voice unless you record something then map the robot voice over it

You are viewing 1 out of 3 answers, click here to view all answers.