What are compressed and uncompressed files, how does it all work and why compressed files take less storage?

932 views

What are compressed and uncompressed files, how does it all work and why compressed files take less storage?

In: Technology

27 Answers

Anonymous 0 Comments

There are many different approaches to compression, but it all basically boils down to the same thing: replacing parts of a file with something that takes up less space than it originally does.

Let’s say you have the utterly true sentence “Babylon 5 is the greatest television show ever”. Now imagine that for each word, you assign a single-digit number:

Babylon=1, is=2, the=3, greatest=4, television=5, show=6, ever=7 (and 5 will just be 5). With that, you can now save a ton of space by writing the file as:

15234567

Now, all you need is a program that knows the number for each word. So, when you decompress that file, it replaces each number with the appropriate word. That idea, generally, is a dictionary-based compression (your dictionary is what word is represented by what number).

Now, it should be obvious that this isn’t going to work on a larger scale because you need a number for each word, and that’s going to be a huge dictionary if you want to be able to compress any English-language file. And, you’ll soon have to have multi-digit numbers and that will be a problem because in some cases, you’re actually going to use MORE space when you compress a file: imagine you create numbers for 1,254 word, then you realize that you forgot to assign a number to the word “as” . Well, the next number available is 1255, which means that anywhere the word “as” appears, which takes up two characters normally, you’ll be “compressing” it with a value that takes up twice as many.

Oops!

Now, you might make that up through repetition in some files: everywhere the word “computer” appears maybe uses the number 4, which means you save 7 characters each time it appears. If it appears a lot, you can make up some of your loss from words like “as”. But, that’s gonna be hit-or-miss at best. And, either way, you’re still left with a massive dictionary, which isn’t going to do you any favors in terms of speed.

Also, while that could maybe/kinda/sorta work for English-language files, what about other languages, or images? Or music? Nope, you’ll need a dictionary for other languages, but for things like images and music, it’s not gonna work at all because there aren’t words like in a spoken language.

Another approach to encryption is RLE, which stands for Run Length Encoding. Here, you look for repeating sequences of bytes in a file. For example, imaging you have a picture, a headshot of yourself maybe. Imagine you took this picture in front of a while wall. If you start looking at the actual bytes in the file, which ultimately represent the color of pixels, you’re going to find that white wall is represented by a lot of repeating values (most likely 255, 255, 255, though not necessarily, but let’s say that’s the case here), one per pixel. So, if there’s 100 pixels that are white (that’s what 255, 255, 255 means in RGB values), then that means you have 300 bytes in a row (a run) that are all the same value (255). With RLE, you would actually write 300-255 (or something like that – there’s many ways you might write it). then, when the image is loaded, the decoder can look at that and say “oh, ok, there are 300 bytes with a value of 255 here” and reconstitute the image appropriately.

The important thing here isn’t the actual implementation details, and like I said, there are many compression techniques. The important thing is what I said at the start: in both approaches, or in any other not described here, things (words or runs of repeating values or something else) are replaced with something that takes up less space (in some cases, it can actually be mathematical formulas that can recreate a large piece of data). In all cases, the effect is that you get a compressed file.

You are viewing 1 out of 27 answers, click here to view all answers.