[ELI5] What is a Compression Scheme?


In data compression, what is the compression scheme?

In: 0

This might be the first time I am ever first to an ELI5, so I might be completely wrong and missing the question.

What I learned 30 years ago in college about jpeg compression is this…

If you have a picture, each pixel has a coordinate, like battle ships game, and a color. So starting top left at (0,0) it might be blue. The next pixel (1,0) might be blue too. The next (2,0) might be cyan. Etc thru an entire picture.

If we save the data as:

(0,0) blue

(0,1) blue

(0,2) cyan

(0,3) cyan

(0,4) blue and on and on, there would be lots of information to store.

But if we look at the array (which is the list of values), we might be able to group.

We could say

(0,0-1) blue

(0,2-4) cyan

Each time we can slice the data with a run of the same values, we can compress the information and save a small amount of data.

If you imagine a full picture, if you can find two pixels side by side with the same color, we can potentially drop a bit, and save on file size.

The scheme is the set of methods used to make the file smaller.

For example, there’s “lossless” compression where the scheme looks for patterns or redundancies in the data such that it can represent them in a smaller form that, when uncompressed again, is bit-for-bit identical to the original file.

Examples of lossless compression are PNG images, ZIP archives and FLAC audio. What you put into them, you get exactly the same thing back out.


Then there are “lossy” compression schemes. Lossy schemes look for ways to throw away or generate an approximation of input data which will produce a different output that is hopefully just *not easily distinguishable* from the original.

For example, MP3 and AAC audio work by throwing away or smudging sounds that their “psychoacoustic model” thinks that you won’t be able to hear.

Video compression schemes like H.264 and VP9 work by looking for similarities and patterns within *and even* ***between*** frames of a video and fudging them in such a way that it isn’t very noticeable. They also tend to handle brightness/darkness (“luma”) and colors (“chroma”) in a biased way so that they can more heavily fudge certain differences in colors and shading that they think you can’t see.

A compression scheme is simply the method used to compress data. So DEFLATE, LZMA, RLE etc.

You will see the word “scheme” used all over IT, basically it’s analogous to “method”, “implementation”, “template”, “algorithm” etc.