How do PC files get compressed?

432 viewsOtherTechnology

How is it possible that a 10GB folder gets compressed into a 6GB exe or zip file?

In: Technology

8 Answers

Anonymous 0 Comments

The compression algorithm scans the whole data and looks for repeated sections. 

Then it makes a compressed file with a conversion table where larger repeated sections are represented by small numbers and translated with the table. 

Some data compresses much better than others. When things are very repetitious. 

But other data doesn’t compress well at all. Especially data that already went through a compression algorithm. JPEGs and mp3s already are lossy compression formats. 

Look at lempel ziv for one of the earliest and common compression styles. 

https://en.m.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch

Anonymous 0 Comments

First off: it’s not possible to compress everything into something smaller. Whatever algorithm you choose, some possible data will actually end up larger after compression than smaller.

Now that being said, most data we care about has patterns and repetition. The simplest algorithms for compression will look for patterns and repetition and instead of repeating the data or expressing the pattern directly, will instead describe a way to rebuild the repetition or recreate the pattern.

It’s much less data for me to say “1 to 1000 by 2s” than it is to write out “2 4 6 8 10 12…” etc.

Anonymous 0 Comments

If you take a book and replace every occurrence of “the” with “#” instead, you will make the book shorter without losing any information. You have compressed it. 

Of course, if you want to *read* the book, you’ll need to convert all of those “#”s back into”the”s. This is what happens when you expand a zip back into the files it contains. 

Anonymous 0 Comments

exe is not a compressed file, it is an executable file.

now for compression. imagine I had a file that was just 10 gigabytes of

“`







“`

its pretty easy to see that
“`
Every following line in this file starts with https://www.youtube.com/watch?v=
dQw4w9WgXcQ
y3ZhS8wg8z4
XEkOXvkuy0Q
Hrxf-mhwAAs
zaD6WPIOoN8

“`
has the same information, but is much mich smaller, more like 2gb

and thats how zipping works, it finds repeating patterns and replace then with smaller identifier and way to undo this replacement, just in a machine readable way instead of human readable

Anonymous 0 Comments

Lots of clever mathy tricks depending on what compression algorithm is used. One way might be if a file has several repeating bits, say there’s 20 0 bits in a row, to replace them with a shorter pattern that describes the number of 0 bits that it replaced.

It might do the same with long repeating bit patterns, substituting them with a shorter patterns with a look up table to show what to replace when decompressing.

And lots of other ways to achieve the same thing: replacing lots of bits with fewer bits in a way that can somehow be restored when you want to uncompress the file.

Anonymous 0 Comments

A picture of the beach is a blue sky. The middle is green ocean. The bottom is a tan beach. You only need three crayons to describe your picture made from your box of 1024 crayons.

Anonymous 0 Comments

So there are two forms of compression. Lossy and lossless.

Lossy compression is strictly for media files. Essentially they make small alterations to the media file so that it’s easier to compress it and make it smaller. It turns out you can keep making changes until it gets quite small. All video and audio formats that you’re familiar with use this form of compression. It’s not suitable for anything that’s not video or audio or an image because you modify the data in an irreversible way, and this would make the data useless for everything else.

Lossless compression typically revolves around predictable patterns in data that can be exploited. One of the first versions of this was something called RLE compression. Essentially in most text files and lots of databases, you may frequently see long strings of repeated values. These could be spaces in a text file or whatever. If you have 12 G’s in a row, you could just replace that with the number 12 and the letter G and reduce it to two bytes from 12.

One of the most efficient compression algorithms that was a big step forward when it was first developed is Huffman encoding. That’s a bit wise format where you basically take all the bytes in a file, and you count the number of instances of each value then you construct a binary tree with the most common on top and the least common below. This is especially good for text because certain characters are very frequently used like vowels. The letter e could be reduced to a single bit.

There’s also a whole host of other schemes that people have come up with and most modern compression algorithms will use some hybrid of different approaches to try to shrink things down.

Anonymous 0 Comments

ELI5 is an abbreviation for Explain Like I’m Five. So you have essentially compressed a 17 letter phrase in 4 letters.

Compressing a folder is essentially doing the same thing. Abbreviating repeated data into something smaller that takes up less disk space.