what is the major difference between ZIP, RAR, 7z and other compression algorithms?


Do they use vastly different algorithms? Are any of those tool particularly ideal for one type of scenario over another?

In: 2

The each have their own specific algorithm for shrinking a file.

I believe the RAR format makes the smallest compressed files but has the fewest options for decompression.

The zip format may have a slightly bigger compressed file but has the most options for decompression.

Here are some differences

* Some are lossy (information is lost, like JPEG), some are lossless (perfect reproduction, like GIF)
* Some can bundle multiple files into one file (zip), others can’t
* Some are symmetric and are fast (or slow) at both compression and decompression, others are asymmetric and slow for compression but fast for decompression (mpeg)
* Some are faster but don’t compress as much, while others are slower but do better compression
* Some work best with text, or audio, or video, or still images, or random binary

The differences are more historical than technical.

ZIP is the oldest of the 3.
What gave it supremacy in the 1990s was the original software shipped with a full specification for it.
That meant that other software developers could write their own zipping/unzipping software which would be compatible with the original PKZIP.
That led to the .zip file being supported ubiquitously, on basically every type of computer around.
That was a real rarity back in the day: it was very rare for a file format to allow for real, easy, painless interoperability with different computer systems.

Due to its age, ZIP originally supported only a few (now considered) old compression systems, which are not so great by today’s standards.
Newer and more advanced compression systems have been added into the file format over the years, but not universally.
Some ZIP software doesn’t support all of the newer stuff, which means that a lot of ZIP software will default to using the older compression systems, to get maximum interoperability.

RAR was a competitor to ZIP in the 1990s, trying out different compression systems to get an edge on ZIP.
Unlike ZIP, RAR is a proprietary format, and never achieved very wide support because of that.
In the late 1990s and early 2000s, RAR was legitimately superior to ZIP in its compression abilities, though it still never took a strong strangehold (outside of certain countries and use cases like piracy) because it didn’t have wide support or work out-of-the-box on people’s computers.
These days, ZIP and RAR support more or less the same compression systems.

7z is only about 20 years old.
It was a format developed specifically for a new compression system being researched called LZMA, which beat the pants off of all the older compression systems.
Like ZIP, it was fully specified (and the source code was even released into the open).
Due to the open source nature of 7z, the LZMA code was incorporated into a lot of other file formats, like ZIP and RAR.

Because 7z had an open specification like ZIP, it immediately reached wide support on all computing platforms.
However, by the time 7z came about, operating systems were already including ZIP capabilities directly baked into the operating system (there was no longer any need to download compression software).
To use 7z, you needed to installed the 7z software.
To use ZIP, you didn’t need to install anything: it was already part of your operating system.
Because people are very fond of convenience, 7z would never get as popular as ZIP, even if it was superior.

And now, it’s possible to use LZMA with ZIP, so….

Just use ZIP.

Other differences are not very interesting.
All 3 bundle multiple files into one archive.
All 3 allow splitting large archives into multiple pieces.
All 3 allow the use of the same best compression schemes.
All 3 allow secure password-based encryption.
There are very few interesting technical differences between them.

(Edit: there are a few minor differences in supported compression schemes.
For example, ZIP and RAR have compression schemes specific to compressing .wav files, whereas 7z doesn’t.
These very specific use cases very rarely come up these days.
The best general-purpose compression schemes, DEFLATE, LZW, bzip2, PPMd, LZMA, that will be used in 99.9% of cases, have equal support among all 3 formats)

They are all pretty much the same thing with some minor tweaks to the algorithms.

The basic algorithms all follow the same basic process. They are a two step process of a dictionary step, and then an entropy coding step.

The dictionary step basically looks though the data and then builds a dictionary with abbreviations in it. It’s quick, but not particularly efficient. For example after dictionary coding the phrase “the cat sat on the mat” might be coded as “!=the;*=at ! c* s* on ! m*”

The problem with this type of dictionary building step is that it goes through the data in order, and takes no account of how frequently phrases occur. Rarely used sequences might get the shortest abbreviation.

Entropy coding techniques vary. However, the commonest forms look at the structure of the data to find parts which repeat or almost repeat. They are a bit like dictionary techniques, but they look at how much patterns are repeated or repeated with minor changes, and put shorter abbreviations for the most common ones, and longer abbreviations for less common sequences. There are varying degrees of cleverness in how algorithms score what is considered likely and unlikely when allocating abbreviations.

The entropy coder is where most of the differences between Zip, Rar and 7z are – Zip uses the Huffman algorithm for this step; Rar uses an algorithm called prediction by pattern matching with information inheritance; 7z uses Markov chains.