How does file compression/ZIP files work?

254 viewsEngineeringOther

I understand it’s to save file space, but like…how?

Follow-up: If you have to compress a photo to save on file size (when emailing, for example) couldn’t it just send essentially a text file on how to uncompress it back to its original size and quality? (ex. “Set the resolution to XxY and add X color at Y pixels”)

Thank you!

In: Engineering

9 Answers

Anonymous 0 Comments

Compression works by using short codes for common data, and long codes for rare data.

For example, the standard text code uses 8 bits for every letter, which is not efficient for English. Letter E is the most common letter in English – so we can give a short 6-bit code for it. Letter Z is the least common – so we can give it a longer 11 bit code. This substitution will make average English text shorter. If some letter doesn’t appear at all – it doesn’t need a code.

It can be compressed even further, if we consider the words – not every combination of letters is a word. We can make a dictionary of all English words and assign them codes according to their frequency – this will compress average text to 25% of its original size.

ZIP uses a combination of a dictionary and frequency coding. It starts from assumption that all letters are equal and there are no words, but as it reads the file – it keeps the tallies and adapts the coding to the text. That means, that the beginning of the text is always badly compressed – but it becomes better and better later.

Note, that no compression can compress every file – there are always files that actually get longer, even if just by 1 bit. A completely random stream of characters cannot be compressed by **any** method – its uncompressed form is already the shortest possible.

>couldn’t it just send essentially a text file on how to uncompress it back to its original size and quality? (ex. “Set the resolution to XxY and add X color at Y pixels”)

This will actually make most pictures much longer – you waste a lot of bits to say “Set the resolution” and “Add color”. All current picture formats just demand that info to be listed in some specific order – so the PNG reader doesn’t need to read “Set resolution” – it just knows that “5th number from the beginning is resolution”.

You are viewing 1 out of 9 answers, click here to view all answers.