eli5 – How do spacial audio technologies like Dolby Atmos work in headphones, with only two drivers?


eli5 – How do spacial audio technologies like Dolby Atmos work in headphones, with only two drivers?

In: 5

How does it work in humans, with only two ears? It works because the shapes of the ears and the head influence how things sound depending on the direction from which the sound waves arrive. Spatial audio simulates these effects in software.

You only have two ears. Each of your ears can pick up one sound signal, i.e. changes in air pressure over time. No matter how many sources of sound there are in different places around you, in the end this is all the information you get: two signals measured at two points on either side of your head. So, to recreate the experience of sound coming from different points in space, all you need to do is recreate the two sound waveforms that these sound sources would create in your two ears.

How do you do this? The best way is to stick two microphones in your own ear canals to record the sound. That way you don’t have to artificially recreate anything – all the spatial information is already there. This works best if you record it in your own ears, since some of the spatial information in the sound waves comes from the way they interact with the outer part of your ear (the *pinnae* – i.e. the funny-shaped cartilage-and-skin protrusions that you picture when you think of ears). But it will also work if you record it in your own ears, or on an artificial set of ears, and then play it back in someone else’s ears. It just won’t be as accurate.

If you can’t use this method, then you have to take a recorded sound and insert the spatial information somehow. Spatial information in sound comes from a few things, but the most important ones are interaural time and loudness differences. Sound travels at about 300 m/s, and so, with your ears spaced about 20 cm apart, a sound coming from your left will reach your left ear about 0.6 ms sooner than your right ear. A sound coming from right in front of you will reach both ears at the same time, and a sound coming from your right will get to your right ear 0.6 ms sooner. So, by comparing the sounds in your two ears, your brain can tell how far to the left or right of you a sound was produced.

Loudness follows a similar principle. Sounds are louder in the ear that they are closer to, so by comparing the loudness of the same sound picked up in your two ears, your brain can tell how close to each ear it was produced, and thus how far to the left or right the sound source was located.

Using software, you can take a sound and recreate these timing and intensity differences, and thus introduce the desired spatial information. Of course, this only works for simulating spatial information in the left-right dimension. So how do you do front-back or top-bottom (i.e. how high up a sound source was, or how far in front or behind you)? This is where the pinnae come in. The outer parts of your ear act like a directional filter, meaning they let through (or reflect) different sound frequencies differently depending on where the sound is coming from. Your brain has learned how your particular ears do this and so by comparing the frequency profile of the same sound in your two ears, it can figure out where the sound must have come from to have produced this particular filtered frequency signature. However, this is not as accurate as the localization based on interaural time and loudness differences. And, as I said, it is somewhat dependent on the shape of your ears. We all have somewhat similar ears of course, but there are subtle (or not-so-subtle) differences, and also these frequency distortions depend not just on the shape of your ears but also on the individual shape and anatomy of your head. So it’s much harder to recreate these spatial effects accurately, but you can at least do it to a rough approximation, by using a model that captures the frequency distortion profile for the average person.

Your ears can’t really tell where sound is coming from, so you only need one driver per ear. The trick is knowing what sound each ear should hear, so your brain thinks the sound is coming from a specific place. So you take the “raw” unmodified sound in the recording, then modify it to simulate position.

If the sound source is to your left, then it will appear louder to your left ear than it does to your right ear. It will also arrive a little bit earlier to your left ear than it does to your right ear. Your head is maybe 20cm across, so it takes sound 0.6ms to go from one side to the other. A tiny difference, but enough for your brain to pick up on.

You know how things sound muffled when you hear them throw a hat or something like that? Your head and your ears do the same thing. Sound coming from behind you has to go through the back of your ears, and that muffles it. Sound from one side has to go through your whole skull to reach the other.

Then you have reverberation: all the sound bouncing off the walls and other surfaces. That gives you a ton of cues as to the size and shape of the room.

Spatial audio technologies basically try to simulate these effects (and more) to build a nice realistic model of what it sounds like for a noise to come from a certain place.

You might want to look at [head-related transfer functions](https://en.m.wikipedia.org/wiki/Head-related_transfer_function) for more details.

Same as a FPS video game. Sounds are recorded with spatial placement information, during playback, sounds are rendered in a virtual soundstage, with transimission and transformation due to your head’s point of view factored in, then the rendered audio stream gets back in the headphones conventionally.

There are headphones with gyros inside that can transmit head movement to the renderer to simulate point of view change.

[I’ve answered this one before](https://www.reddit.com/r/explainlikeimfive/comments/ldzj17/eli5_in_8d_audio_how_is_the_audio_able_to_sound/gm8vmzd?utm_medium=android_app&utm_source=share&context=3), but not for a wee while.

The short answer is that we *always* only have two inputs of sound, our left ear canal and our right ear canal. The trick is to try to get the sound *to* those ears correct.

Sounds from two or more sources in the real world will interfere (basically adding and subtracting in complicated ways) depending on where they are compared to your head. If you’ve got a decent enough computer, you can calculate how that’ll go and figure out what the sound waves will be like as they enter your ears. Then, you just play *those* sounds through the headphones instead.