How does audio compression (mp3, etc) make sound files so much smaller?


A recent post asking about file zipping made me wonder…does audio compression do the same thing? Is it finding pieces of the sound that are identical and then saving them only once in the MP3 file? It’s one thing to identify patterns in a text file and only save one version of the repeating parts, but somehow that doesn’t seem feasible with audio since things like music have so much complexity.

In: Technology

So audio is actually filtered in ranges the human ear can actually hear in Mp3. A microphone captures sound that’s out of the range of human capacity for sound. It removes those pitches to save some space. As for actual compression algorithms, I’m not sure.

The encoder takes a short window of time and transforms it from time into frequency domain. This gives the intensity over a range of sound pitches, as opposed to instantaneous amplitude. These intensity values are then rounded to reduce the number of digits required to store them.

A listener usually cannot distinguish sounds of a similar pitch that differ significantly in volume. So some noise is acceptable in the presence of a louder signal, within a limited band of frequencies. Rounding directly in time domain would affect all frequency ranges equally, and noise would become audible in those that are quiet.

A sound signal is usually quite chaotic in nature and it is impossible to find exact repeating patterns in it. Compression usually takes advantage of neighboring samples having similar amplitude, and stores the difference between them with fewer digits. This is how multimedia or delta compression works in a general purpose archiver (Rar), and is also used in frequency domain samples.

So you have an audio file, which is a digital sampling of a wave form. Now the human ear is better at hearing lower frequencies (pitch) than higher frequencies. We don’t want to throw out higher frequencies… but we could store higher frequencies at a lower fidelity while keeping high fidelity for the frequencies that your ears are good at hearing.

So what you do is you perform a Fourier Transform. This converts the audio file from a waveform to a bunch of frequencies… all of the information is exactly the same, but the format is different. Now, you can store the low frequencies at a high bitrate and you can also store the high frequencies at a low bitrate… saving lots of space. There’s also a bonus feature that when you’re in frequency space… there’s often a lot of frequencies that are just plain empty… you can throw those ones out entirely.

Then your computer or phone or whatever performs an inverse Fourier Transform which gives it back the original signal, except with a loss of fidelity in those higher frequencies.

A key difference between audio compression and data/text compression is that audio compression is “lossy”. Meaning some data is lost, but because of the nature of the data, you can still mostly get the idea of what’s going on.

If I started typing an explanation and left out a lettr frm ech wrd, yu wuld stil mstly knw wht I ws sayng, it would just look like I’m an idiot that can’t type. But it would use less space!

[Here]( is an example of an audio clip encoded in various levels of compression. You can tell it’s the same music, but sounds muffled or scratchy because a lot of the data has been removed.

Another notable difference with data/text compression is that it’s completely reversible. Audio compression is not. Once you have lost some of the data, you can make guesses about the pieces to fill in the gaps, but can’t guarantee that you can get back to the full data package.

Lossless algorithms like FLAC work pretty much the same as a zip file: they compress the same information into a smaller space by maximizing entropy.

Lossy algorithms like MP3 or Ogg Vorbis actually remove information from the soundfile. If you encode the same soundfile into MP3s of different bit rates, you will probably not hear a difference at the higher rates, but you might notice that at lower rates the sound gets duller, as the algorithm starts to aggressively remove higher frequencies.