The sounds are already muddled together. Sound waves add up: The guitar, the singer, and the drum beats will all produce their own waves which then combine in to a single complicated sound wave. Our brains are just really good at separating the complicated sound wave back in to the guitar, singer, and drums.
If you replicate that complicated sound wave with a speaker our brains will do the hard work of interpreting it as many sounds instead of just the one.
Multiple [waves add up](https://s3-us-west-2.amazonaws.com/courses-images/wp-content/uploads/sites/1989/2017/06/13225738/figure-17-10-04a.jpeg) to a single “compound” wave.
A microphone is just a membrane that picks up this compound wave from multiple voices / instruments, and creates an electrical signal that matches that exact “compound” wave. Amplifiers simply amplify the compound wave, without changing (or understanding) it in any way. Speakers are just another membrane that uses magnets to convert the electrical compound wave back to a sound (vibrations in air).
Your eardrum is [just a membrane](https://previews.123rf.com/images/stockshoppe/stockshoppe1403/stockshoppe140300254/26566160-vector-illustration-of-diagram-of-human-ear-anatomy.jpg) that picks up vibrations of the air (very much like a microphone). These vibrations are passed through a series of bones to a spiral (cochlea) containing a liquid and hairs that can vibrate if the liquid vibrates. These hairs are attached to nerves and send signals to your brain.
The sound vibrations pass through the [cochlea liquid](https://i.pinimg.com/originals/40/71/85/4071852705126baeb83cef993aaf8aca.jpg) in the direction of the arrows. The “sensor” hairs (Cilia) are thicker at one end, and thinner at the other, thus they pick up low frequencies (bass sounds) at one end, and high frequencies (high pitch) at the other. So they separate the compound sound wave by frequency, and these frequencies are then sent to your brain through different neurons. Basically, your cochlea de-composes the sound wave back to simpler component frequencies.
Then your brain analyzes the different frequencies and “recognizes” patterns that it heard before. Violin vs. piano vs. someone’s voice. And the words they’re singing. And the meaning behind the words. That’s all in your brain.
The speaker plays all the sounds at once. Your ears can tell the difference between a guitar and a piano, but the speaker doesn’t know this, it just plays back the ‘song’; everything together.
You hearing differences between the instruments isn’t real either, since you aren’t actually listening to the original source, you’re listening to a recording, and your brain will make a distinction between the different sounds, but in actuality that are no different sounds — there is one singular sound, coming from the speaker.
Latest Answers