What’s the logic behind compressed files like .zip and .rar?

642 views

What’s the logic behind compressed files like .zip and .rar?

In: Technology

2 Answers

Anonymous 0 Comments

For general purpose compression, the idea is that you find a sequence that is repeated multiple times in the data you’re compressing, and then replace those with references to a dictionary that contains that sequence exactly once.

Let’s say you start with the text of Green Eggs and Ham. There are 50 different words in that story. You make a list of all those words, give them a number, and then replace the instances of the word in the text with that number. Now every single word is replaced with a 1 or 2 digit number.

You can then do the same thing again with those results and see if it’s significantly smaller. You’ll likely hit the repeated word sequences next (the “I do not like…” phrases). You can then shrink those down to become a reference of references. As long as you keep getting smaller each iteration, you can keep going.

Anonymous 0 Comments

The logic is pretty simple, instead of representing the data in raw form you find patterns in the data and use those.

Example:

aaaaaaaaaa

can be represented as 10a because there are 10 a’s, it takes up way less space and can be decompressed to the original raw data without any data loss (lossless)

Each compression scheme (algorithm) has its own way of finding patterns.