There’s a lot of characteristics to a voice.
* The tone/timbre of the voice
* The rhythm/cadence at which someone speaks
* Their prosody, ie. how their inflexion changes throughout a sentence (or even a word)
* Their accent; particularly how they pronounce certain vowel sounds
* and additionally their dialect. What kind of words they use
That’s a lot of different features for identifying a voice, it’s not just a sound.
Example: Look how well Tom Hiddleston impersonates Graham Norton: [https://www.youtube.com/watch?v=zzqWPDnYEik](https://www.youtube.com/watch?v=zzqWPDnYEik) He’s paying attention to each one of the points I mentioned above (except dialect, he used fairly standard language).
Humans evolved as social animals, so we are typically very good at distinguishing human features, including faces and voices. Physically, our faces and voices are as similar to each other as the faces and voices of sheep, but we can’t distinguish sheep as easily as we can distinguish humans. It is the power of our brains.
When you hear someone speak vowel sounds in particular, the timbre (resonant characteristics) is determined by the physical dimensions of a person’s body, especially their throat, mouth and nose. Your brain learns to decode this to learn about the speaker, just as reverberations provide clues about the room you’re in.
You can tell the difference between different instruments. Just by playing one note, you can tell whether that instrument is a piano or a guitar. This is because the sounds aren’t just made of the fundamental pitches of those notes, but also other pitches and tones in different ratios that are unique to each instrument. Human voice boxes work the same way; every voice box is different, and so if you’re familiar with the sound of a voice you can identify it when you hear it.
Our ears are very fancy, they are not like single microphone, but rather set of thousands of microphones called [[“hair cells”]](https://iiif.wellcomecollection.org/image/B0000114.jpg/full/880%2C/0/default.jpg), that allow us to distinguish not only a single sound, but also “whole bouquet” of all sound frequencies that fall into our ear.
Someone talking to us might have a major sound that is dominating in their voice, but it will always be a combination of many different sounds and reflections, that’s also how even with our eyes closed we can recognize if a person is standing in front of us or behind.
Same with faces, and lots of other things. Your brain is amazingly capable of distinguishing the smallest differences in nearly everything, but obviously only does so in what it thinks is important to. Recognising voices is clearly quite important, so we do. Recognising the intonation and tone in a word is very important if you’re a primary mandarin speaker, and so they can, a natural English speaker would find it difficult to do the same, as they don’t need to, and can you tell the difference between the moo’s of two different cows? Well you better believe that Barry the cow can tell the difference between Maisy and Suzy’s moo’s!
They just aren’t. Sound is one of those things we think as human are good at but we’re not. Impressionists can fool us and we’re regularly confused by parents and children on the phone.
It’s purely in our heads that we are great at hearing voices. It’s our eyes that do much of the heavy lifting. We can be easily fooled by audio illusions and are tricked by binaural audio. Far and bar are interchangeable based on mouth movements. Go look at audio illusions.
Latest Answers