What is spatial audio?

147 views

What is spatial audio?

In: 2

4 Answers

Anonymous 0 Comments

Spatial audio is a more advanced technology trying to replicate how recorded sounds are spatially perceived by listeners than stereo

True SA comprises of two parts, audio recording and playback.

Old-fashioned stereo records in two channels, left and right, but the sound collecting microphones don’t have the exact same propogation properties as human ears, results in stereo recordings have pretty vague positioning. It’s pretty difficult to tell anything other than generic left or right.

Surrond sound tried to improve on that by recording with at least 4 channels, making it much more directional, at least horizontally. Modern home cinema are commonly recorded with 5 or 7 channels, not counting the bass channel.

Stereo and surrond sound all have the problem in that the recording and playback channels aren’t always spatially identical. There are standards and recommendations in playback equipment setup and calibration and the recordings are made assuming those playback parameter requirements are met, but we all know it’s not practical for home theaters to be 100% up to spec, compromises have to be made on playback systems and sometimes recordings have to take that in consideration as well, reducing spacial information accuracy.

True spatial audio tries to solve that problem by calculating how the sounds propogate from source all the way to your ears with more accurate mathematical models of sound propogation from various directions to your ears. The model is called “head related transfer function” (HRTF). With that model, how sounds are collected and reflected in your pinna (outer ear) from various directions causing various frequency and phase changes will be calculated for each sound object, mixing together to represent what you actually hear and played back via traditional output devices.

This kind of sound rendering technologies are pretty common in video games, sound objects are actually spatially placed and HRTF calculated according to player point of view position. It is also pretty common for recording studios to produce surround sound movie tracks or stereo TV/music tracks this way. Component sound objects are recorded with separate microphones then spatially recombined, mastering software can even change sound object placement virtually, fine tuning the final output.

Better yet, using a microphone array with known relative position to each other, advanced software can analyze the individual recorded tracks and use beamforming calculation to pick out individual sound objects and work out their spatial placement in relation to the microphones, so you have spatial information to work with, this is important later on.

Then there’s the technology trying to move to consumer theater and music.

SA recordings can be traditional surround sound, or newer technologies like Dolby Atmos, with which there can be dozens of “sound object” tracks recording how they sound along with information regarding spatial placement.

When the device plays back a SA recording, it is basically doing sound rendering just like a 3D video game, with either fixed point of view and predefined HRTF then output via traditional surround channels, or better yet, use a pair of headphones with well-known frequency response characteristics so HRTF calculation can factor them in creating very accurate rendering of spatial audio, faithfully recreating the soundstage like the listener is on site instead sitting at home.

What’s even more magical is that if the headphones can measure the listener’s head movements, it’s literally like moving your point of view in a FPS game, the playback device can adjust to that and render sound object moving, so you feel like the sound is really there, much more accurate than regular stereo or surround sound.

Sometimes older songs, even those recorded decades before, can be remastered to be true SA, it’s because the original mastering files were either recorded separately for each sound component and you know where they are placed, or they were recorded with multi-microphone arrays placed at the concert, making reasonably accurate beamforming deduction possible.

Without these original master files, Apple devices can still “spatialize” regular stereo songs, they basically create a virtual sound stage with only two spatially placed “speaker” objects, creating the illusion that they are actually spatially recorded tracks, but that “spatialize” feature sounds hollow because there is very little information from stereo tracks to work with, and that’s as much as battery-powered handheld devices can compute.

You are viewing 1 out of 4 answers, click here to view all answers.