What are compressed and uncompressed files, how does it all work and why compressed files take less storage?

933 views

What are compressed and uncompressed files, how does it all work and why compressed files take less storage?

In: Technology

27 Answers

Anonymous 0 Comments

Software programmer here. Like all binary data, files are stored as a series of 1’s and 0’s. Now imagine you had a file that was just a million 1’s. If you wanted to describe this file to someone, it would be a lot smaller to write “a million 1’s” instead of actually writing out “1” a million times. That’s compression.

More formally, compressing a file is actually writing a program that can write the uncompressed file for you. The compressed size of the file is then the size of that program. Decompressing the file is actually running the program to build your uncompressed file.

More structured data like a large rectangle of a single color compresses well because it is easy to write a program that describes that data. On the other hand, random data is not very compressible because it does not have any structure and so you can’t do much other than have your program actually write out the entire number, which is not going to be any smaller than the entire number itself.

This is also why compressing a compressed file does not save more space. You can think of compression like squeezing juice out of a lemon, where the more structure that exists in the file, the more juice there is to squeeze, but once you have thoroughly squeezed it, there is no more juice left. Compression turns highly structured data into low structured data, so then when you compress again, you are dealing with random-ish data that doesn’t have enough structure to take advantage of. You can also turn this backwards, and attempt to talk about how random some data is by measuring how easy it is to compress.

There are two types of compression. The type I described above is lossless where the uncompressed file is exactly the same as the original file. Lossless algorithms are typically not that complicated and usually look for large areas of the file that share structure, like I mentioned above. Zip files are lossless.

The other type of compression is lossy, where the uncompressed file does not have the same data as the original file, but has some sort of acceptable amount of data loss built into it. In return, lossy algorithms are far better at reducing the size. Lossy algorithms can be very complicated. JPEG and MPEG files are the main example of lossy compression. From personal experience, if you save a BMP file as a JPEG file, it will tend to be around a tenth the size of its BMP file. However, the JPEG file will not be the same pixels as the BMP file. The compression algorithm for JPEG files have been specifically tuned for photographs, so if you see a JPEG photograph you probably won’t be able to tell that some pixels have been altered. However, for something like digital art, especially pixel art, it is much more noticeable, so you should never save digital art as a JPEG.

You are viewing 1 out of 27 answers, click here to view all answers.