How does lossless compression work?

140 views

Files are just a bunch of bits, how is it possible to make the filesize smaller without removing any of those bits?

In: 3

5 Answers

Anonymous 0 Comments

If the sequence of bits is completely arbitrary, it isn’t. You can’t make one possible file smaller without, in principle, making another possible one bigger.

But most applications, whether that’s images or audio or text, tend to have specific patterns more often than other specific patterns. In text, for example, the strings “at” or “ed” are common, while the string “xq” is very rare. So if you use a compression approach that makes “at” or “ed” smaller and “xq” bigger, you will, on average, tend to make your file size smaller.

More generally, the smallest size you can compress files to (given no knowledge except the file itself) is given by their [information entropy](https://en.wikipedia.org/wiki/Entropy_(information_theory)). The “more random” the strings of bits, the higher the entropy. When each string of bits is equally likely, the information entropy is 1 bit per bit, which means you can’t compress things losslessly. But real applications tend to have much less.

English text encoded as ASCII, for example, takes up 1 byte (8 bits) per character, while the information entropy of English text is usually around 1 *bit* per character (because some strings are much more common than others). So you can losslessly compress English text by a factor of about 8.

You are viewing 1 out of 5 answers, click here to view all answers.