eli5: Why hasn’t the audio quality of “on hold” music improved in decades?


eli5: Why hasn’t the audio quality of “on hold” music improved in decades?

In: 934

sound is compressed when sending it trough telephone network. Compression was designed to work best with human voice

On hold music uses the same phone system as your normal call, but music is different from human voice, so it sounds bad… compared to your hifi spotify playlist

a) it doesn’t have to and b) it’s actually gotten worse.

TomScott has a pretty good video about it. What it boils down to is that there’s only so much room on a wire for signals but there’s lots of ways of compressing a signal so it takes up less space. Different methods of compression have different tradeoffs.

One of the tradeoffs they use for telephone signals is they can compress it a *lot* but the compression makes things outside of normal speaking ranges sound terrible.

And then that signal reaches a company where it goes into their internal telephone system, which uses another compression. And that might go into a third, fourth, or more different system to end up at the call center, all with their own compressions. Some are better than others, but stacking them on top of each other cannot end well.

And on top of that splitting an amplifying an analog signal is pretty straight forward. Give it more power, connect multiple wires together, and you’ve got an amplified and split signal. But a digital signal? You need processing power for each one, so whatever bottom shelf cheapest possible system they have serving up the hold music has to simultaneously handle all the conversations that are being held as well as play hold music – which mind you is being compressed to hell and back in a way that’s not good for music – for everyone, all at the same time.

So between terrible compression and minimum possible hardware, you end up with truly atrocious sound quality.

All the effort of improving phone codecs has gone into making speech clearer over low-bitrate channels. This has resulted in speech codecs that are completely different from general-purpose audio codecs like you would use for music or the audio track of a movie – you are not trying to squeeze a diverse type of sound into 100 kbit/s, you are trying to squeeze a very specific kind of sound into something likt 2 kbit/s. As a result, those codecs are hyper-optimized for stuff like human voices – any bits spend to encode non-voice sounds would hurt efficiency. And music typically includes instruments that sound nothing like a human voice, which results in them getting mangled to hell.

Back in the days of analog audio (and primitive digital audio for that matter), there was little in speech-specific processing, so while early digital phone systems had mediocre rendering for speech and hold music alike, the digitization process also did not actively mangle the music – in effect, hold music has even gotten worse because we have gotten more effective at stripping speech to its bare essentials. Plus phone services are always eager to fit more customers into the same bandwidth, so a lot of the improvements in speech compression were actually eaten up by that instead of going into clearer speech as well

Since 1972 telephones have adhered to a standard that converts sound into digital signals 8000 samples every second or 8000 Hertz (Hz). A tone with a frequency of 200 Hz collects 40 samples each cycle; a tone of 2000 Hz collects only 4 samples within a cycle. The higher frequency that a sound has, the fewer samples per cycle are collected. With fewer samples per cycle it is harder to reproduce the sound accurately when played back at the other end.