What are compressed and uncompressed files, how does it all work and why compressed files take less storage?

934 views

What are compressed and uncompressed files, how does it all work and why compressed files take less storage?

In: Technology

27 Answers

Anonymous 0 Comments

Compression works by finding patterns in the data and then storing those patterns instead of the data itself. There are lots of different way to do this and a lot of different theory involved but it is the basic principle. Compression does work better when the compression algorithm is built for the specific file type. So a generic compression algorithm that is made to work on any file does not work as good on say image files as a dedicated image compression algorithm. Some algorithm might even opt to lose some information that is not important and does not fit into an easy pattern. This is most common in image and video where the exact value of each pixel is not that important. Compression algorithms also do not work if there is no patterns to the data. So random data, encrypted data or already compressed data can not be compressed any further.

Anonymous 0 Comments

File compression saves hard drive space by removing redundant data.

For example take a 500 page book and scan through it to find the 3 most commonly used words.

Then replace those words with place holders so ‘the’ becomes $, etc

Put an index at the front of the book that translates those symbols to words.

Now the book contains exactly the same information as before, but now it’s a couple dozen pages shorter. This is the basics of how file compression works. You find duplicate data in a file and replace it with pointers.

The upside is reduced space usage, the downside is your processor has to work harder to *inflate* the file when it’s needed.

Anonymous 0 Comments

Lets say I have a file that contains following:

aaaaaaaaaaaaaaaaaaaa

I could compress that like this:

a20

Obviously it is now smaller.

Real compression comes from redundancy and from the fact that most data is wasteful in the first place. A byte is 8 bits and thats basically the smallest amount of data that can be moved or stored. How ever, if you type a message like this, this only contains 26 different letters and some numbers and punctuation. With 5 bits you can encode 31 different characters, so we could already compress the data a lot. Next level is to count the letters and notice that some are way more common than others, so lets give shorter bit lengths per character for those. You can look into Huffman coding for more detailed info.

Another form of compression is lossy compression which is used for images, videos and sound. You can easily reduce the amount of colors used in the image and it would still look the same to humans. Also you could merge similar pixels into same color and say that “this 6×6 block is white”.

Anonymous 0 Comments

Compressing and uncompressing a file is like translating a book into a different language, except you make up the language based on what’s in the book. To make files smaller, you have your new language use very short words for the most common words or phrases in the original language, and longer words for the uncommon ones. Then you have to make the dictionary that translates back to the original language, or figure out rules so that you can construct the dictionary, and then the compressed file is the translated file plus the dictionary.

In most cases the compression method (or translation) is chosen to be very good for “normal” files, but bad for “uncommon” files that you generally wouldn’t encounter. Mathematically you can’t have a one-to-one translation that converts every possible combination of letters into a shorter form, because then some combinations would have the same translation and you wouldn’t know which one was the original when you translate it back. If you don’t need *exactly* the original file because it’s something like a picture, you can have a translation that is always shorter, but in general if you try to compress an already compressed file it doesn’t get smaller.

Anonymous 0 Comments

Basically compression makes some rules that you can use to re create (uncompress) the file.

In the most basic case, imagine you have a text file that for some reason is like 1,000,000 ‘a’ characters. Instead of storing all 1million, you can store something like ‘1000000a’ which saves a lot of space.

If you had 1000 ‘a’ characters followed by 1000 ‘b’ characters, you might compress the file by writing it as ‘1000a1000b’.

The steps you follow (in this case to count the number of same characters in a row) is called the compression algorithm. There are many different compression algorithms that have different characteristics (for example if you want to compress video or text or audio).

Now in our example, we can recreate exactly the information we started with from our compressed file (it would be pretty useless if we couldn’t read the text after we uncompressed it right?). These are called lossless algorithms.

There are also lossy algorithms, which compress stuff, but you can’t get the exact original back. So for example, let’s say you have the data 123456689. We can write that (approximately) as the formula x=y and then when we uncompress, we would get 123456789 which is almost the same as the original. Examples of lossy compression are jpeg, where the compressed images are less clear than the original (maybe more pixelation, or the colours aren’t the same etc.)

There are many different compression algorithms, suited to different purposes and data types (images, text, audio, video etc), and they can be quite complicated mathematically.

Anonymous 0 Comments

[deleted]

Anonymous 0 Comments

I like the text examples

One for movies or animations is where they only save what changes between the frames. So if you have 100 frames all black, change them to one black frame and set it so that it takes up the same length of time as the 100 frames did. If you have a shot with blue sky, and it doesn’t change because all the action is going on in the lower half of the frame, save the blue part of the frame and lengthen it/draw it out the same way as was done with the black, once something moves, only then do you have something you need to keep. This can be done for 10000 frames in a row, or it can be done if there are 2 frames with only 10% of the screen the same as the one before it.

Anonymous 0 Comments

Suppose you’re writing a grocery list. Your list initially says this:

I need to get 6 eggs
I need to get 2 liters of soy milk
I need to get 2 liters of almond milk
I need to get 1 pound of ground beef

There’s a lot of repetition in there, right? A smart compression algorithm would recognize that, and might render it like this:

I need to get
6 eggs
2 liters of soy milk
2 liters of almond milk
1 pound of ground beef

An even better compression algorithm might be able to further improve things:

I need to get
6 eggs
2 liters of
soy milk
almond milk
1 pound of ground beef

This is basically what compressing a file does. You take information that’s repeated multiple times and remove that repetition, replacing it with instructions on how to put it back when you need to reconstruct the original content.

Anonymous 0 Comments

Imagine you wrote a cool code where every time there were double letters, you replaced them with a number. Now lets say oo (two lower case Os) = 1, and ll (two lower-case Ls) = 2. Using that code:

balloon becomes ba21n: _balloon_ is 7 characters but _ba21n_ is only 5!

Now imagine that the pattern lloo happens alot, so you specify a special code for that. We’ll use 9 for that.

Now _balloon_ becomes _ba9n_ which is only four characters!

Of course it’s not that simple, but that’s compression in a nutshell. When you get longer strings of repetitive data (there are lots of zeros in files, for example) the compression gets even better.

Anonymous 0 Comments

Software programmer here. Like all binary data, files are stored as a series of 1’s and 0’s. Now imagine you had a file that was just a million 1’s. If you wanted to describe this file to someone, it would be a lot smaller to write “a million 1’s” instead of actually writing out “1” a million times. That’s compression.

More formally, compressing a file is actually writing a program that can write the uncompressed file for you. The compressed size of the file is then the size of that program. Decompressing the file is actually running the program to build your uncompressed file.

More structured data like a large rectangle of a single color compresses well because it is easy to write a program that describes that data. On the other hand, random data is not very compressible because it does not have any structure and so you can’t do much other than have your program actually write out the entire number, which is not going to be any smaller than the entire number itself.

This is also why compressing a compressed file does not save more space. You can think of compression like squeezing juice out of a lemon, where the more structure that exists in the file, the more juice there is to squeeze, but once you have thoroughly squeezed it, there is no more juice left. Compression turns highly structured data into low structured data, so then when you compress again, you are dealing with random-ish data that doesn’t have enough structure to take advantage of. You can also turn this backwards, and attempt to talk about how random some data is by measuring how easy it is to compress.

There are two types of compression. The type I described above is lossless where the uncompressed file is exactly the same as the original file. Lossless algorithms are typically not that complicated and usually look for large areas of the file that share structure, like I mentioned above. Zip files are lossless.

The other type of compression is lossy, where the uncompressed file does not have the same data as the original file, but has some sort of acceptable amount of data loss built into it. In return, lossy algorithms are far better at reducing the size. Lossy algorithms can be very complicated. JPEG and MPEG files are the main example of lossy compression. From personal experience, if you save a BMP file as a JPEG file, it will tend to be around a tenth the size of its BMP file. However, the JPEG file will not be the same pixels as the BMP file. The compression algorithm for JPEG files have been specifically tuned for photographs, so if you see a JPEG photograph you probably won’t be able to tell that some pixels have been altered. However, for something like digital art, especially pixel art, it is much more noticeable, so you should never save digital art as a JPEG.