How can the human ear (the brain, really) clearly discern more than one sound at a time?



I understand how sound is generated by pressure waves vibrating the eardrum. And this makes perfect sense to me when a single sound is generating that vibration. But when multiple sounds are vibrating the eardrum at the same time (like when listening to music with different instruments and vocals) how does the brain tease those differing vibrations apart so we can hear the individual inputs…as opposed to them mixing all together into one sound; The equivalent of mixing a bunch of different paint colors together and ending up with brown.

In: Biology

Different sounds have different frequencies, meaning they beat at different rates per second. You can’t really make different sounds with the same frequency, only change the volume (how loud it is).

It’s not hard to pick those apart, it’s like picking apart threads of varying thickness. What your brain is really good at though is pickin out patterns in those threads so that when the frequency of a sound does change, you know it’s from the same source, such as a raising voice, or a changing drum beat.

Edit: Changed pitch to volume

I’m not an expert but I did start reading a book on music cognition, which I never finished.

I think you’re right to think about the ear, not just the brain. A single area that transmits a wave can transmit multiple waves at the same time, without those waves interacting too much. For instance imagine swinging on a swing. As you go from front to back, the chain is moving in a consistent pattern, which would be a wave except that the frame at the top of the swing is fixed and absorbs all the force. As you swung back and forth, you could also start shaking the chain, and then waves would travel up and down the chain. These waves would be separate to the waves of you going back and forth on the swing, they wouldn’t particularly interact. In the same way, from a physics point of view, it makes sense for a single eardrum to be sensitive to multiple simultaneous sonic waves.

That’s what I remember from the book anyway

The same way your brain draws boundaries between the individual objects in a pile of objects sitting on a tables. The details of the process are incredibly complicated. So complicated, in fact, that our best computers running our best algorithms can’t do it nearly as well. Basically, our brains evolved to be good at pattern recognition, including patterns in sound.

ELI5: When sounds are added together a completely new sound wave is constructed that is an amalgamation of both waves.

Think of waves in a pool. If you have one person making the waves at one end of the pool using the same amount of energy every time, then you’ll see one set of waves moving across the pool in an orderly fashion.

But what if two people are say… 10 feet apart and both make the same type of wave. Do you still see one wave in the pool? No. You see places where the waves add together, and places where the waves cancel each other out. This “addition” of both the waves together is actually a unique wave in itself. This is how sound works as well. Each individual sound “adds” together to make a new, unique, sound wave. Our brains are just good at picking out one sound in that sound wave. (Computers are also very good at it.)

ELI’mOlder: What you want to look up is related to Fourier Transform. It’s a math… thing… but you don’t actually need to know the math. Watch this video from where I linked it (at 50s).

You only need to watch until ~2:00.

A fourier transform is the way we program computers to separate sounds, or rather, the way we program the computer to be able to identify certain individual sounds in a sound amalgamation.

Our brain does the same thing, and that’s how we can pick out a “single” sound among an array of others.

Also, the sound “addition” is also how a single speaker with a single moving component can produce many different tones at once (aka music!)

I don’t understand it well enough myself, but complex waveforms are just waves added up. Since the human brain does not need to discern specific frequencies, just the sum of the data, I don’t believe theres much signal processing going on until it gets to analyzing speech and the like.

FFT or Fast Fourier Transform is the process computers use to do the work. But humans wouldn’t have a good reason to know that somebody’s voice is exactly between xHz and xHz.

[Waves add up]( Your ear drum is like a rubber duck floating on the ocean; it moves up and down with the big swells (bass sounds) and with the little ripples caused by the wind (high pitch sounds), at the same time. The surface of the ocean is “big swells and tiny ripples” all added up together into one “complex” wave.

But the “sound sensors” in your ear are NOT in the ear drum, they’re in the spiral [in this diagram]( called the Cochlea.

So what happens in the Cochlea ([more detailed diagram here]( is the complex sound wave from your ear drum pushes through those bones into the Oval Window at the top, and vibrates the liquid in that spiral. The sound waves travel through that liquid in the direction of the arrows. And the sensor hairs (Cilia) in that Organ of Corti in the middle, they vibrate as the sound passes, and send those impulses to the brain via neurons.

The trick though, is that the thicker hairs that detect bass sounds are at one end of that liquid spiral path, and the thin hairs that detect high pitch sounds are at the other end. So the sound frequencies (low bass vs. high pitch) get “decomposed” and picked up by separate hairs, and sent to the brain on different neurons. Your Cochlea decomposes that “complex” wave of all the instruments in an orchestra, by frequency.

But it’s your brain that makes sense of it all. Your brain is very good at recognizing patterns, and waves are patterns. Violin vs. drums vs. not just a person’s voice, but words and *meaning* of the words, all of that is figured out by the brain, from sound frequencies. It’s just what the brain does.

Same with vision. Sound frequency is pitch (bass vs high pitch), light frequency is color. Your brain doesn’t just see a splotch of different colors (different light frequencies), it recognizes objects, people’s faces, *emotions* on those faces, etc. Pattern recognition.