Crowds take a single note or sequence of notes that might not sound that great on their own, and mix them together with a whole bunch of other tones, both high and low, as the crowd sings along. Everything mixes together to your ear, and the end result is that you get something that’s very close to the right notes, but on a much larger scale. Crowds are also able to “self-correct” as the go, where individuals will be more likely to sing on key if they hear others around them on key.
Think about walking into a busy convention hall or theater where everyone is talking at once. That “buzz” that you’re picturing is the median tone of all of the conversations in the room. Of course you can still pick it people who are higher or lower if you try, but overall there tends to be an even “hum” at a certain arbitrary pitch. Now picture all of those people trying to say the same words or sing the same song. The ones who are higher or lower are still there, but most people are close to or on pitch, and the overall effect is that the crowd sounds good because again, high and low are present in equal measure
EDIT: since I’ve gotten a few people telling me I don’t know Jack about this, I clarified this answer. I also removed the autotune analogy. It was confusing and probably not the best thing to use, and I may have been incorrect in the mechanics.
Latest Answers