I cannot seem to wrap my head around this. I can understand how this would work when recording a live band, because the sound is already “mastered”. But I can’t seem to understand how it’s done when different tracks are recorded separately.
1) They pan the music digitally OR 2) record it physically in position in relation to the microphone.
1. In the digital music editing software, the sound engineer can use a slider bar to make the music mostly or entirely come out of one speaker or the other. If the software is advanced enough, it may have a map of a virtual room, and the engineer can precisely pick a location to cause the software to simulate what the sound would be like.
2. If the microphone can record in stereo, then the musician can position themself anywhere in relation to it. For example, if the musician plays to the left of the mic, the mic will pick up the sound coming from that specific location and record it that way. The sound engineer can then just adjust the volume or other parameters within the software to blend it with the rest of the sounds.
The layering is just using multiple soundtracks recorded or playing over each other. For example, I’m a solo musician, so I’ll record one soundtrack of me playing piano. Then, make a new soundtrack and record myself singing the melody. I can layer those two soundtracks in the editing software, literally, because there are two soundtracks and the software will play them at the same time. Then I record myself playing violin, guitar, or singing harmonies. All those multiple soundtracks will then be layered or placed physically on top of each other in the software so it all plays at the same time.
It’s all about the timing of the soundwaves hitting each ear nanoseconds apart. Our brains are very good at audio-spatial recognition. But they can be fooled too and I’m sure there are some mathematics tricks in more modern 7:2 and 9:2 mixes.
We get lots of neat toys like this advanced panning tool here (Which is free and you can try out yourself!). It’s like a regular pan tool that lets you place the audio left to right but can also alter it to make you perceive it as coming from behind, above, etc. [https://www.dear-reality.com/products/dearvr-micro](https://www.dear-reality.com/products/dearvr-micro). As for layering sounds behind each other, it’s mostly balancing each audio element’s gain right as well as creatively applying reverb to give them a sense of physical space.
Mixing engineers use tools like panning, compression, EQ, etc in order to make the independent tracks gel together. Additionally, a decent speaker setup will have a sub woofer at the bottom, medium sized speakers in the middle, and tweeters on top, allowing higher frequencies sounds to rest higher in space with lower frequencies down below
You can use the [head-related transfer function](https://en.wikipedia.org/wiki/Head-related_transfer_function) (HRTF) that positions sound sources somewhere in 3D space and then calculate what parts of it go into each ear.
Despite you only having two ears, the shape of your ears and how your brain processes the information can help tell you where a sound is coming from in terms of forward vs back and up and down. The HRTF effectively simulates that for a song or for sounds in a game environment.