How can two singers sing the same song in the same key still have distinguishable voices?



This is actually question my daughter posed and I’m pretty stumped. She asked how, if two people with (let’s say) perfect pitch sing a song, how is it possible that we can still tell who is singing when the notes would be identical?

Note: I know absolutely nothing about music, but figured this was the best place to ask for her.

Edit: Wow, many of these answers are incredible! I had no idea this would receive such in depth and thoughtful feedback. I have learned a huge amount. I was not exaggerating above when I said I know nothing about music (I don’t even know what pitch is – just quoted my daughter on that) and I’m grateful to those of you who took the time to help me learn.

In: Biology

Timbre of the voice is what makes it sound different. I’d suggest starting more research there!

So basically everyone has vocal chords but they’re all shaped a little different. Because of that little difference it makes the frequencies slightly different. And our bodies are also different so the way the sound resonates in my mouth before it comes out is different.

The sound isn’t just the exact pure tone of the pitch they’re singing in. Every instrument and voice has a distribution of frequencies around the main pitch, known as its *timbre*. A piano, for example, is very concentrated around a specific pitch, while a drum is more spread out (which makes piano a better instrument for expressing detailed harmonies, but also makes it sound much more dissonant if you play a note that’s a bit off).

One of the components of a musical note is its *timbre* (pronounced TAM-bur). Timbre is all the sounds associated with the source that *aren’t* part of the pure tone.

Instruments (and the human voice – hereafter I’ll just say instrument, but it works the same either way) don’t produce a pure tone. The instrument creates the root frequency, the pitch you’re trying to make, and also overtones. Take a guitar string: it will vibrate at a particular frequency, and it will also vibrate at exactly twice that, and exactly thrice, and exactly four times, and etc. The shape of the instrument and what it’s made of and the size and shape and material of the main source of vibrations (lips, reeds, vocal cords, etc.) all change which overtones get amplified and which get diminished. Your ears can hear the differences in these overtones, although your brain filters it from your conscious perception of the sound unless you focus on it.

With a human voice, this includes the size and shape of your mouth and lungs and sinuses and skull and thickness of your skull and jaw and tongue and so on and so forth. All of these things change the overtones in subtle ways, so that even when the root pitch is the same the pitches around it won’t be.

Timbre also includes all the unique sounds that come from the instrument: things like key clicks or valve movements or breath noises or little scratchy bits in your voice, etc.

Edit: “That’s not how you pronounce ‘timbre!'”

[It is in American English.]( It is at the very least *one* correct pronunciation in English. Yes, I know it’s borrowed from French but this comment isn’t in French, it’s in English. I don’t expect everyone on the internet to understand English, but if you’re reading this in the original that means you understand English. Some 60% of the English lexicon comes directly from French so if you’re gonna get upset every time someone pronounces a French word “wrong” in English you’re not going to get very far.

A voice is an instrument. While some sound similar (some even sound almost identical), subtle differences like size, materials, shape, make them sound different. Those are odd terms to use when describing people, but a 110 lb woman is going to sound different than a 300 lb woman singing the same note. The shape of their mouth, the way they push the air out, all make a difference.

It’s like if I have a trumpet and a flute play the exact same melody. They’re both wind instruments, but they sound different enough that you can differentiate them.

Something nobody else has really touched on in depth: Waveforms.

So in really basic electronic music, you’ve got sine waves, sawtooth waves, and square waves. Every note is literally just a pulse of air at a given frequency. It’s why car engines, which are literally just exploding aerosolized gasoline, make audible notes.

There are videos you can search (I’d link one but I can’t right now) that show the relationship between frequency and pitch.

**So what does that have to do with a sine wave?**

Well the waveform is what the sound wave actually looks like. A square wave is completely no noise, then immediately completely 100% energy, then back to complete silence. A sawtooth wave is like a square wave at first, but instead of staying at 100% energy it trails to zero over time. A sine wave is just a very rounded (sinusoidal) square wave so the energy changes are smoother.

And all of those waves have different timbres, or tones.

But if we layer a sawtooth wave with a sine wave, or we decide to cut a huge divot in the top of a sine wave, you’ll get different tones still. Playing with these waveforms is precisely how electric keyboards attempt to synthesize other instruments.

Okay so now we can step away from the electronic sounds, and go back to the natural world. Horns, car exhausts, and the human throat all have characteristics that make their own wave form. There are so many things that can affect which frequencies are highlighted and which frequencies are subdued. You can choose to manipulate those with tongue placement and mouth shape, or bell shape and pipe length or construction material.

Timbre is the key thing here along with the overarching concept of tone. It’s also why any two different instruments (a violin and a saxophone, for example) can play the same note at the same pitch and be easily distinguishable.

The architectural and performance variables of the instrument play an intrinsic part in the “sound”. A saxophone — being made of metal, having a reed, and requiring air flow and key fingering — will undoubtedly create different tones than a violin — being made of wood, having strings, and requiring vibration via bow and manual input on the finger board.

All good answers.

For a five year old I’d say that a saxophone and a flute can play the same note, but they have unique shapes to their bodies causing a difference in sound. Humans also have different shapes to their bodies causing them to sound different when singing the same note.

Edit for a more complete answer to address harmonics and overtones:

Imagine having a palette of only red paints. They are all the same color (or note) but are different shades (or spectrums) of the red paint note. You can mix the lighter red shade with the darker red shade and you’ll still get a red. The color of red that a person can sing is based on their unique blending of red shades. They sing these shades based on how their body is built.

Another thing I marvel at this subject is how it takes special instruments and shapes to make music and sounds yet even a tiny speaker can recreate that special timbre.

When someone sings a note at a certain frequency (let’s say 400 Hz) it’s not just that frequency playing, it’s actually a bunch of frequencies which are whole number multiples of 400 Hz (which is called the *fundamental frequency*). So in addition to 400 Hz, you also have 800, 1200, 1600, etc, which are called *overtones*. The reason that this happens has to do with the fact that the ends of a string (or vocal cord, etc) that vibrate have to be still, a condition which can be satisfied by whole number multiples of the fundamental frequency as visualized [here]( Notice how for all of the depicted frequencies, the ends of the “string” do not vibrate, meaning that it is a valid frequency for that string.

These overtone frequencies tend to get quieter and quieter the higher you go relative to the fundamental frequency, but how loud a particular overtone is relative to the other frequencies is determined by the shape and composition of the thing that is vibrating. Each person’s vocal cords and voicebox and mouth are going to be shaped a bit differently, and so different overtones will be emphasized, leading to a different sound.

Pitch is only one element of sound. The human voice has many components, only one of which is pitch. Another poster mentioned timbre, that’s another component. Tonal quality also includes things like how steady you hold a note. Perfect pitch only tells you when you’re off the note. It doesn’t grant you the ability to sing it as perfectly as you can hear it. Some can tell you what note it is, others can only tell whether or not it’s flat or sharp. If they sing, they warble like a cockatoo.

Vibrato is another part of tonal quality. Sometimes it can lend warmth to the music. Other times, it’s annoying. Barbershop music should never be sung with vibrato — you want the chord to ring pure, and it can’t do that if each singer is vibrating differently from the others. Choir music can get away with vibrato, especially with lead or solo singers, and opera is almost defined by it.

Two people can be singing the same note straight tone, no vibrato, and they’re still distinct because one resonates the tone in their head, while the other resonates it in their chest. The former sounds nasal, the latter richer and fuller, but it’s still the same note.

There’s lots to music that isn’t about pitch.


Oh cool something I know a little about from my past in audio recording.

The top answer is totally right. But interesting thing that happens when you’re recording vocals or any other instrument for that matter. You can duplicate tracks so you have two sound files playing the exact same pitch and timbre. Everything is exactly the same. To the listener, all it will sound like is as if the original track got louder. But take the exact same singer or instrument and record a brand new take playing the same thing, the minute differences, even from the exact same instrument/player/singer is enough to give the listener the perception of layers rather than just being louder.

Also fun fact, if you simply move the second duplicate track off by milliseconds, it doesn’t give it the same “layered” sound of a new take, but instead creates the “chime-y” like sound effect called “chorus” (or swirly sound called “phaser”/“flange” depending on the amount of milliseconds delay).

TL; DR – In theory, if two voices could be so identical in timing, pitch, timbre, and everything, you definitely couldn’t tell them apart. But only computers or recordings can be so precise. So anything performed by humans, there are so many small imperfections in performance that your brain can tell the difference.

I think there’s a better and more interesting answer than the ones posted here, even though they’re all good explanations.

The “note” a singer, or any other instrument makes, is a frequency. Literally “how **frequently** does the sound oscillate?”

With a guitar, it’s “how frequently does the guitar string oscillate?” Meaning vibrate. If you watched a guitar string in slow-motion, you’d be able to see it vibrating after it was plucked. You can kinda see it even without slo-mo, it’s just a blur.

With your voice, it’s flaps of skin in your throat that are vibratring.

If someone sings an A#, that means their vocal chords are vibrating 466 times per second. Everyone singing an A# at the same time is vibrating their vocal chords 466 times per second.

But sound is MORE than just a frequency, which you know if you think about it. It’s also an “amplitude.” Which means “loudness.” We could both be singing A#, but I might sing louder than you. Same note, two different volumes.

But sound also has a SHAPE! Which is SUPER COOL! Let’s look at the “purest” tone, which is called a [Sine Wave.](

That is a real simple wave and because it’s so simple it would make a very pure tone if you listened to it. But **pitch** is **just** frequency. A wave with a different **shape** but the same **frequency** would be the same pitch, but could sound very different.

Let’s look at a different kind of wave. What’s called a [Saw Wave.](

You can see why it’s called a saw wave, right? Looks like the teeth of a saw!

Well, this makes a VERY different sound. It sounds…actually it sorta sound the way it looks! It has an *edge*. It’s not as pure as the sine wave. When you listen to any bowed instrument, the sound you’re hearing is a Saw Wave, because that’s the actual physical motion of the string!

[Watch this!](

(the preview might not be working)

You can see it there. The bow is pulled across the string. At first, the friction of the bow catches the string and pulls it smoothly back. That’s the “ramp up” of the saw wave. Eventually the tension in the string overcomes the bow’s friction, and the string ‘snaps’ back. Which is the sharp, straight-down line of the saw wave. But the bow is still pulling, so the string gets caught again and the cycle repeats.

Saw Waves and Sine Waves are still pretty simple though. The waves produced by the human voice look *weird* and *messy.* [Look!](

If you look on the graph, everything from the 1 hash, to the 8 mark is ONE cycle. That is a complex wave and it’s still way simpler than the human voice. The human voice looks more like [this.]([email protected]/Waveform-showing-extreme-aperiodicity-phrase-finally-by-a-female-English-speaker.png)

THAT is why two people singing the same note are recognizably different. They’re vocal chords are vibrating VERY complexly. So complex, it’s almost unique! When you recognize someone’s voice, you’re recognizing the unique properties of the SHAPE of the wave their vocal chords make. That shape is based on the physical shape of their vocal chords and their throat and even their mouth which is helping shape the sound as it comes out.

The **rate** at which their skin flaps vibrate might be the same, but because their skin is floppy and weird shaped, it doesn’t just go smoothly up and down like a guitar string. It waggles all over WHILE going up and down and that is what singers and musicians call “timbre.” Timbre means “The way your skin flaps waggle around while you vibrate them.”

The difference in our voices is created by the differences in the shape and size and tilt of our voice box, the individual shape, strength and movement of the video cords inside that and the physical differences in the shape and size of our airway, tongue, teeth, mouth and nasal cavities. In other words, subtle differences in physical anatomy generate the difference because sound travels directly in different bodies from the vocal cords all the way through the mouth and nose.

(I’m a speech therapist)

A fun example—

If you take recordings of instruments playing the same sound, and you cut off the beginning and end for each instrument, you’ll have trouble identifying the difference.

It’s not just about the pitch but the timbre of your voice.

When we think of Bob Dylan’s gravely voice, we’re talking about timbre, not pitch. When you hear his duets with Johnny Cash, you can immediately tell who’s who.

I’m not an expert by any means. But I can sing the same note in different ways. Once using a chesty voice, then using a softer, breathier voice. I can also go more nasally, or apply some distortion. Just changing the shape of your mouth can also impact the sound.

A piano, guitar, marimba, glockenspiel, flute, harpsichord, harp, and…you get my point, can all play for the most part a set of identical notes, and yet you could easily distinguish them from one another.

Human voices are all different in the same way. We all have differences in our voices that contribute to how our singing voice sounds. Though I will say, the more well trained they are and how perfect their pitch is, you’d find it hard to distinguish 2 female soprano singers singing an E6 or similarly high note. But down in the mid range of your singing voice which comes from a combination of your chest and head voices, you’ll start to hear the differences between 2 singers quite clearly.

Music teacher here.

The same way that you can have dark green and light green and green stripes and green spots and shiny green and matte green (importantly, all without changing the color towards red or blue), you can have a note come out in many different ways without changing the pitch (high-ness or low-ness of a note).

In music, this is called timbre (pronounced TAM-BER for some reason), or “Tone Color”.

As an interesting exercise, have them hit a single note, and move their mouth through the vowels.

Compare the ‘O’ sound with the “EEEE” sound. The O sounds lower, while the EEEE sounds higher, even though the pitch stays the same.
When we make sounds with our mouths, there is one main pitch, and lots of little “sub-pitches” called harmonics, that change the way the main pitch sounds.

There exists a type of note with no extra harmonics, called a Sine Wave, which is only the main pitch and nothing else. YouTube can play it for you.

Interestingly, it’s the beginning and end of notes that hold most of the key in differentiating between instruments. The middle sounds pretty similar:

You might also get good answers at /r/musictheory