Crowds take a single note or sequence of notes that might not sound that great on their own, and mix them together with a whole bunch of other tones, both high and low, as the crowd sings along. Everything mixes together to your ear, and the end result is that you get something that’s very close to the right notes, but on a much larger scale. Crowds are also able to “self-correct” as the go, where individuals will be more likely to sing on key if they hear others around them on key.
Think about walking into a busy convention hall or theater where everyone is talking at once. That “buzz” that you’re picturing is the median tone of all of the conversations in the room. Of course you can still pick it people who are higher or lower if you try, but overall there tends to be an even “hum” at a certain arbitrary pitch. Now picture all of those people trying to say the same words or sing the same song. The ones who are higher or lower are still there, but most people are close to or on pitch, and the overall effect is that the crowd sounds good because again, high and low are present in equal measure
EDIT: since I’ve gotten a few people telling me I don’t know Jack about this, I clarified this answer. I also removed the autotune analogy. It was confusing and probably not the best thing to use, and I may have been incorrect in the mechanics.
Short answer to your question: Your ear can be fooled. If a pitch (or especially a large group of pitches) gets close enough to what we expect to hear then our brain will process the pitch as being correct. In the case of a large group of pitches our brain will simply muffle out the wrong ones, but even with a single pitch we are able to “snap” the pitch to the correct position if it is close enough. The fact that your ear can be fooled is in fact a core principle of western music.
I’m going to ramble on a bit about your ear getting fooled being a core part of western music which isn’t actually related to your question all that much but I like talking about it. This next part will not be ELi5 and I won’t fault anyone for getting out while they can or just checking the tl;dr and then moving on with their life. Trigger warning for people who are afraid of math.
**tl;dr of what is to come: On any twelve tone instrument eleven out of the twelve tones are fundamentally out of tune. For example in the key of A (A=440 Hz. This 440 is arbitrary, it is simply what we have chosen as a reference pitch) the note E is supposed to be 660 Hz. But we instead tune it to 659.26 Hz. The reason we do this is that any twelve tone system can only possibly be in tune for one single key, so instead we compromise and make every single key equally “out of tune” in a way that makes them all sound so close to the real thing that our ear deceives us into accepting them as the real thing**
You know how an octave on a piano or guitar has 12 notes? Would you like to know how many of them are actually in tune? It’s only one of them. Only the note that determines the key that the song is in is actually in tune (because it’s impossible not to be, that note decides what the reference pitch even is).
And I’m not even talking accidental minute tuning differences, I’m saying that any note except for the root and its octaves are actually fundamentally out of tune. They have to be, otherwise each piano would only be able to play in one single key. (That is how instruments were tuned in medieval times, but let’s not get into that for now).
The twelve notes of our equal-tempered system are essentially metaphors for the notes that they actually represent. In order for our twelve tone system to work we need all twelve notes to be exactly an equal distance to each other. Making 6 steps of two notes should come out at a perfect octave (double the frequency), and so should 4 leaps of three notes, and three leaps of four notes, and two leaps of six notes. Or any combination thereof: 3+4+5=12 therefore if I move upward by three notes, then four notes, then five then the resulting frequency should be double the frequency (an “octave”) of where I started. The same goes for 8+4 or 11+1 or any other combination you can think of.
Because of this we have tuned each note on a twelve-tone instrument to be the twelfth root of 2 higher than the previous one, so if my A = 440 Hz then my A# needs to be 440*2^(1/12) and my B needs to be 440*2^(2/12) and the next note has to be 440*2^(3/12) etc. Because that’s the only possible way that we can cover the same distance 12 times and then end up on double the frequency (“octave”).
But that isn’t the “real” frequency of the sounds that these notes represent. Because the actual sounds that make up music are derived from what are called “overtones”. When a string vibrates they are not only vibrating at their root frequencies, they are also vibrating at what is known as “overtones”. [Wikipedia has a nice graphical representation of what those look like if they were all isolated](https://upload.wikimedia.org/wikipedia/commons/a/aa/Vibration_corde_trois_modes_petit.gif).
I know that it’s a little difficult, but you need to imagine that a vibrating string will produce all three of these motions at the same time. And not just the motions pictured in the wikipedia gif, it is actually overtones all the way down. Each new overtone vibrates at a higher frequency but with a smaller intensity, so the higher your overtones get the harder it becomes to hear them. If you go deep enough you could argue that the intensity of the vibration is so low that the wave might not exist at all, as far as the human ear goes this happens somewhere in the region of overtone ~10 but it depends on the sound and the listener (meaning 10 times the original frequency). (very rough approximation).
In other words: If I’m playing a song in the key of A and I play the note E, what I am actually doing is resonating with the third overtone of my root note (A). My root note vibrates at 440 Hz, its second overtone (octave) vibrates at 2*440=880 Hz, and its third overtone (dominant) vibrates at 3*440=1320 Hz. So the note E played in the key of A should in theory have a frequency of 1320 Hz, or any doubling or halving thereof. So 660 Hz when we reduce it back to the octave of our original A.
But the A in our equal-tempered twelve-tone system isn’t 660 Hz. It’s 440*2^(7/12)=659.26 Hz. This may seem like a small difference, but that is because the “dominant” note has the frequency that is closest to the real deal. It only gets worse from here. And in any case this still means that we have fundamentally tuned our instrument to the wrong pitch, but our ear does not mind. Our ear will gladly snap it into the right position for us. This is the mechanic that we exploit to be able to build twelve-tone instruments that can use all twelve of its tones as root notes: By creating a system of compromise where every note we play is slightly out of tune, but never so much that it becomes offensive to the ear.
In a perfectly tuned system every note should be an elegant multiplication of prime numbers 2, 3 and 5 based on this overtone sequence discussed above. (Some music also uses frequencies based on the prime 7, but I’ve never heard music going as deep as the prime 11 though I have no doubt that somebody has tried). Below I will include a table of how all twelve notes should have been tuned if we wanted a perfect key of A versus how we tune them in our equal temperament. I will start with the formulas and then add a second table with absolute values and how much % they are “out of tune”. (Using linear % here is actually not good because it’s a logarithmic scale but let’s settle for it for now):
Keep in mind here that the 440 Hz that we base this off is completely arbitrary. It is simply the global standard for reference pitch, and any other number would be just as valid.
Formulas:
Note | Formula in equal-tempered intonation | Formula in just intonation
—|—|—-
A | 440*2^(0/12) | 440*(1)
Bb | 440*2^(1/12) | 440*((2*2*2*2)/(3*5))
B | 440*2^(2/12) | 440*((3*3)/(2*2*2))
C | 440*2^(3/12) | 440*((2*3)/5)
C# | 440*2^(4/12) | 440*(5/(2*2))
D | 440*2^(5/12) | 440*((2*2)/3)
D# | 440*2^(6/12) | 440*((3*3*5)/(2*2*2*2*2))
E | 440*2^(7/12) | 440*(3/2)
F | 440*2^(8/12) | 440*((2*2*2)/5)
F# | 440*2^(9/12) | 440*(5/3)
G | 440*2^(10/12) | 440*((3*3)/5)
G# | 440*2^(11/12) | 440*((3*5)/(2*2*2))
A | 440*2^(12/12) | 440*2
Absolute frequencies and percentual differences:
Note | Frequency in equal-tempered intonation | Frequency in just intonation | % difference
—|—|—-|—-
A |440.00 |440.00 |0.00%
Bb |466.16 |469.33 |-0.68%
Bb |493.88 |495.00 |-0.23%
C |523.25 |528.00 |-0.90%
C# |554.37 |550.00 |0.79%
D |587.33 |586.67 |0.11%
D# |622.25 |618.75 |0.57%
E |659.26 |660.00 |-0.11%
F |698.46 |704.00 |-0.79%
F# |739.99 |733.33 |0.91%
G |783.99 |792.00 |-1.01%
G# |830.61 |825.00 |0.68%
A |880.00 |880.00 |0.00%
So as you can see every single note on a piano (except for the root note of our scale) is out of tune by around 1% in frequency. If we didn’t do this then we would have to tune our piano to for example a “just A” tuning, where all twelve notes sound absolutely perfect in the key of A, but they are going to sound atrocious in the key of D#. So instead we compromise: We make all twelve keys sound equally “out of tune”, but just enough so that our mind can trick is into thinking that they are perfect.
If somebody actually made it this far then I am proud of you for spending your time listening to some passionate guy on the internet talk about something that they know a lot about.
One final bonus round for musicians: You know how we often say things like “C# and Db are the same note”? Well they actually aren’t. That same black key represents both the C# and the Db, but a just C# and Db have different frequencies, even within the same key. In the key of A=440 for example a C# would have a frequency of 440*(5/(2*2))=550 Hz, whereas a Db would have a frequency of 440*((2*2*2*2*2)/(5*5))=563.2 Hz. So on our piano we play the frequency 554.37 Hz, but that frequency is actually a metaphor for either the frequency 550 Hz or 563.2 Hz depending on context. That is why we call them “enharmonic equivalents” and not “the same note”
I disagree with the common answers here. No, sound and pitch do not “average out”. That’s not at all how music and tonality works.
The truth is that the people who *can* sing WILL sing, and, here is the important bit, even those who aren’t great singers can still match pitch. Tone deafness doesn’t actually exist; it’s just a general term for someone who is bad at singing. It isn’t a scientific term, because literally every person (as long as you aren’t genuinely deaf or mute) can sing and match a pitch with their voice.
People who are not good solo singers sound bad because they aren’t using the right techniques or mechanics. We sing differently when we sing solo, or when we sing the leading part in a group. When we sing as a chorus, our mechanics and techniques are different, and all the “bad” parts of people’s voices go away. We can blend our sound, even if you aren’t a trained musician. Even if you rarely every sing, everybody has this innate ability.
Latest Answers