How does lossless compression work?

223 views

Files are just a bunch of bits, how is it possible to make the filesize smaller without removing any of those bits?

In: 3

5 Answers

Anonymous 0 Comments

I give you the numbers 1111111333399999 and tell you I’ll need you to give those numbers back to me tomorrow.

You could write down 1111111333399999 (16 digits). But you notice a pattern: I gave you seven 1’s, followed by four 3’s, followed by five 9’s. If you write down 714359 (6 digits) you can capture the same information in a much shorter form.

This is only possible because the digits I gave you aren’t completely random. There are patterns (in this case, runs of repeated digits) and your coding scheme exploits them. (This particular coding scheme is so simple and common it has a name, *run-length encoding*.)

Commonly used lossless compression formats (like zip files) use more complicated techniques that can find more complicated patterns, but the principle is the same.

[1] In fact, there’s an entire branch of mathematics that studies coding and compression (information theory). One of the central insights of information theory is that a totally random sequence can’t be (losslessly) compressed. You can only (losslessly) compress a sequence if it’s somewhat non-random.

No matter how a lossless compression program works internally, its outputs will on average be shorter than its inputs only if, on average, the inputs contain some kind of pattern(s) that arise more often than they “should” by random chance.

Of course all kinds of common data contains patterns; this post (like most English text) contains mostly lowercase letters, and there are lots of vowels and spaces. Some letters and many punctuation marks appear very rarely, or perhaps not at all! And it’s not just letters. Multi-letter phonemes, words, and phrases are also repeated far more often than they would be by chance.

You are viewing 1 out of 5 answers, click here to view all answers.