AnswerCult

Question

937 viewsJanuary 3, 2024

Question 92.10K June 6, 2021 0 Comments

What are compressed and uncompressed files, how does it all work and why compressed files take less storage?

In: Technology

27 Answers

1 2 3 Next »

Answer 1 · 2021-06-07T15:12:28+00:00

A compressed file is a type of file that has undergone a process to reduce the size of its overall data. When the data in a file is compressed, it takes less storage space, especially when it comes to digital e- files that are sent over computer networks like the internet or an intranet. Compressed data also takes less memory space for processing.

Uncompressed file – is a file that has been allowed to use all the space it needs. Usually files are compressed in order to save disk space, but usually this process reduces the quality of the original file.

Answer 2 · 2021-06-07T15:12:05+00:00

You know those waterproof ponchos you can get, the ones that come in a little compact pouch, that’s a compressed file but when it rains you open the thing up (ie only when needed) and put it on.

When in the pouch is takes up a tiny amount of space in your bag – if you don’t compress it (ie scrunch it back up and stuff it in the pouch) and just shove it back in your bag as a poncho it takes up more room in your bag.

Answer 3 · 2021-06-07T12:40:50+00:00

Imagine you have a graph of 100 points. To know the position of each point in X& y you’d need 2 bytes each or 200 bytes. However you could come up with a mathematical formula that when run could make a line that nearly matches all the points, it’ll be a little off, but now instead of 200 bytes you only need 20 to store the formula that at intervals of the X axis outputs the required y axis for the points.

Compressing images is even simpler. A raw image uses 8 bits (or a byte) per colour Chanel per pixel to define the brightness and colour of the pixke as a whole, the subpixle is set from 0 (off) to 256 (on) with in-between being increasing brightness. This is an 8 bit colour (2x2x2x2x2x2x2x2 = 256) there are several options for compressing, for starters you could reduce the number of bits per subpixle. Or reduce the resolution by merging nearby pixles together. You could throw away every other pixle and then interpolate that colour during unpacking based off it’s neighbours

Answer 4 · 2021-06-07T12:35:23+00:00

many great answers already, but i throw in how i got it explained when i was young. very simplistic, but here goes:

say you have the text string “aaaaaaaaaabbbbbbbbbbbbbbbbbbbb”

10 a’s, 20 b’s.

instead of saving it like that, when compressing, you can write it like “10a20b”

when those 10 a’s and 20 b’s appear multiple times in several locations, you can have the first instance shortened like above, and for the other appearances, you can make an “alias”. l like “string1”. doesn’t make sense in this example because string1 is longer than 10a20b, but if that string was longer, it would save space to use an alias that’s shorter and to refer to they original string

Answer 5 · 2021-06-07T11:23:10+00:00

Also to add.. there is lossless and lossy compression where lossy compression looks to remove data that is considered low informational content.

“For emaxlpe, it deson’t mttaer in waht oredr the ltteers in a wrod aepapr, the olny iprmoatnt tihng is taht the frist and lsat ltteer are in the rghit pcale. The rset can be a toatl mses and you can sitll raed it wouthit pobelrm.”
The above sentence is copied from https://www.livescience.com/18392-reading-jumbled-words.html

In a similar way, lossy compression can remove/ replace content with minimal change to structure of the data

Answer 6 · 2021-06-07T10:11:23+00:00

There’s some good answers about how lossless compression works, and that’s really useful. But the answers for lossy compression are lacking a bit.

There’s also lossy compression, where some of the data is literally discarded during compression, then when you reopen the file, the computer basically makes educated guesses about what used to be there. As an example, you could remove all of the u’s following q’s, the S’s from the end of plural words, the apostrophes from contractions, and all of the punctuation. It’s pretty likely that you could look at that text and, given the rules that the computer used when compressing the file, figure out what was supposed to go where based on the rules and the context. I.e:

This is the original text, which I thought up rather quickly. It’s not the best example possible, but it should work well for our purposes.

Becomes:

This is the original text which I thought up rather qickly Its not the best example possible but it should work well for our purpose

Not really substantially shorter in this case, but we also didn’t have a very optimized algorithm for it. More rules make the file smaller and smaller.

It’s not really ideal for text, but it works pretty well for a lot of artistic data where it just needs to be close enough. Common examples of lossy-compressed files are JPEG pictures and MP3 audio files. It doesn’t matter if we get this specific pixel in our picture the exact right color, just so long as it’s about right given the surrounding pixels.

Answer 7 · 2021-06-07T09:39:56+00:00

Yesterday I had to tell a customer service representative my account number over the phone: 000000932.

I could have said “zero-zero-zero-zero-zero-zero-nine-three-two” but I said “six zeroes nine-three-two.” It was quicker that way.

Sometimes describing a number can be quicker than saying the whole thing. That’s what file compression does, with more math; it finds ways to describe what is in a file that take less time and space than reading out every one and zero. In the same way we would say “the sky in this picture is blue,” software can describe part of a picture as “this pixel is color 000256000 and the next 732 pixels are too.”

Answer 8 · 2021-06-07T09:12:20+00:00

Imagine you want to save a message:

AAAAAAAAAA
AAAAAAAAAA
AAAAAAAAAA
AAAAAAAAAA
AAAAAAAAAA
BAAAAAAAAA
AAAAAAAAAA
AAAAAAAAAA
AAAAAAAAAA
AAAAAAAAAA

It takes 100 characters to save it.

You could save it as:

50*A,B,49*A

And have it saved 11 characters. This is lossless compression, and a kind of thing (though obviously a very primitive version) that, say, 7zip or winrar do.

You could imagine a different algorythm that saves even more space:

100*A

And voila, you saved your message in 5 characters. Well, not exactly your message, you lost the B, but it’s very close to the message, maybe reader wouldn’t notice the B anyway. This is “lossy” compression, where you sacrifice some information the original had in order to save even more space. This is (a very primitive version of) what saving an image as JPG or music as MP3 does. Of course, these formats are popular because they are very good at only loosing the information humans actually don’t notice, but idea is the same.

Answer 9 · 2021-06-07T07:45:37+00:00

OK – let me invent a compression. And this isn’t a real example, and probably won’t save much space – I’m making this up as I go along. I’m going to make the thread title take up less space, as an example.

>ELI5: What are compressed and uncompressed files, how does it all work and why compressed files take less storage?

Hm. “compressed” is long, and appears 3 times. That’s wasteful – I can use that. I’m going to put a token everywhere that string appears. I’ll call my token T, and make it stand out with a couple of slashes: T.

>ELI5: What are T and unT files, how does it all work and why T files take less storage?

Shorter. Only – someone else wouldn’t know what the token stands for. So I’ll stick something on the beginning to sort that out.

>T=compressed::ELI5: What are T and unT files, how does it all work and why T files take less storage?

And there we go. The token T stands for the character string “compressed”; everywhere you see “T” with a slash each side, read “compressed” instead. “::” means “I’ve stopped telling you what my tokens stand for”. Save all that instead of the original title – it’s shorter.

Sure, it’s not MUCH shorter – I said it wasn’t likely to be – but it IS shorter, by 7 bytes. It has been compressed. And anyone who knows the rules I used can recover the whole string exactly as it was. That’s called “lossless compression”. My end result isn’t very readable as it stands, but we can easily program a computer to unpick what I did and display the original text in full. And if we had a lot more text, I suspect I’d be able to find lots more things that repeated multiple times, replace them with tokens as well, and save quite a bit more space. Real-world compression algorithms, of course, will do it better, in more “computer friendly” ways, use more tricks, and beat me hands-down. But the basic idea is the same.

If you did something similar with, say, a digital image with a lot of black in it, we could replace long stretches of black with a token meaning “black” and a number saying how many pixels of black, and save a LOT of space (one short token saying “2574 black pixels here”, say). And if we’re not TOO bothered about getting the EXACT picture back, simply something that looks very close to it, we could – purely as an example, say – treat pixels that are ALMOST black as if they were, and save even more. Sure, when the computer unpicks what we’ve done the picture won’t be precisely identical to what we started with – but likely the difference won’t be very obvious to the human eye, and for most purposes the difference won’t matter. And that’s called “lossy compression”. JPEG, for example, is a lossy compression format.

Answer 10 · 2021-06-07T07:36:29+00:00

A nice example that I always use to explain compression is using images. Consider a completely WHITE image of size 3000×4000 (about your phone camera resolution).

In the simplest case (it is seldom the case), each pixel of an uncompressed image is stored using 3 numbers to describe its color; for example, in 8-bit RGB color space (red green blue) we use the red blue and green components of a color to describe it. A white pixel has the 3 components equal to 255, so a white pixel is represented by 3 numbers all equal to 255.

Without any compression, a 3000×4000 image is composed by 12M*3 numbers… this means that we need 36 000 000 numbers to store an uncompressed file. This corresponds also the number of bytes that we need to store that uncompressed file (because you are using 8 bits, or 1 byte, for each number). This means that without compression an image taken by your phone would require a bit less than 36GB of memory of storage 🙂

Now suppose you want to compress a white image. The simplest way that we can store the image is to literally say that the image is composed of all equal WHITE pixels. Thus in this extreme case, the only thing that you need to store is the color of ALL the pixels: white (255). In other words, instead of storing 36 000 000 bytes we need to store only 1 byte. Then, the device that we are using to visualize the image (phone in this example) needs to ‘recreate’ the original image by replicating the one pixel for 36M times. So we compressed 36GB into 1B!

In practice, there are many compression algorithms, specific for text (zip), for sound (mp3), for images and videos (jpeg and mpeg), and whatever physical phenomena that you can digitalize. So compression algorithms can be more or less very complex. However the idea behind is still the same of my example, and that is to use the recurrent information in the data to be compressed. In our case the recurrent information is the fact that all pixels are white.

AnswerCult

What are compressed and uncompressed files, how does it all work and why compressed files take less storage?

27 Answers

Search questions

Popular Questions

Latest Answers