Some great metaphors on here. I’ll try one, too.
Say we have two spies with codebooks. If a spy gets a message like “SPARROW,” they look up “SPARROW” in the codebook and it reads “Proceed with plan A immediatley.” Or maybe “SWAN” means “Abandon position, return to base by sea.”
If a spy gets an encoded message, he doesn’t know what it means until he looks it up in the codebook. The information’s there, but until he opens the book and looks up the code, he doesn’t know what to do.
ZIP files work the same way. The information’s all there, but you have to go through the process of decoding the information in order to know what it says.
Two special notes, though:
1. You don’t need to unzip the whole file. A program that understands zip files can totally reach in and decode one particular part of the zip file. A number of video games and such take advantage of this, unzipping little pieces or monster image files or whatever as needed.
2. This wasn’t part of your question, but zip files aren’t guaranteed to store thing in less space. There are some files which, when encoded, actually get bigger. That’s almost never the case in practice, but only because zip files are intentionally designed to be very effective on stuff like text documents. But if you filled a file with completely random bytes, zipping it would probably result in a slightly larger file.
There is no way to compress a 12-digit number into a 8-digit number, in general. What you _can_ do is find a method that e.g. compresses some 12-digit numbers into 8 digits, and expands some others into 14 digits. That’s what “all of the information but in less space” really means – you found, say, a reversible mapping from 12 digits to 6-20 digits.
Actual compression algorithms are chosen such that usual data like English text or a photo gets compressed, unusual data you don’t care about like gibberish or pure noise gets expanded. I say “unusual”, but that refers to the source where the data came from, such as a camera – for every “usual” file, there are many, many more “unusual” ones. For example, if you randomly reorder the pixels of a 16×16 image, you get about 256! ≈ 10^(507) images, and most of those look like uniform noise and are not going to come up naturally.
Anyway, some compression algorithms exist that allow you to decompress only certain files, or parts of a file, but even in that case, a decompression process needs to happen first – the information itself is encoded in a complicated way, to a format that isn’t directly readable.
Computers need data to be structured and aligned in exact lengths (say 32, 64… bits) and must be able to access it in random order. Compression algorithms are able to compress effectively only long chunks of data, making them inditinguishable from random gibberish. Current computer architectures cannot random access nor distinguish data in compressed format since structure is completely lost and depends on the data being compressed, so there is no general rule to be able to do that.
Because it is made as compact as possible. A lot of repeating bits can easily be compressed because the zip file is something like. A book
But instead of wasting a whole page for 48 bits, it will say “Hey, this piece of information is actually a lot of 1’s, so just repeat 4000 zero’s, and then the next 48 bits are these for this page. The data after those 48 bits should be interpreted as a new page.
Well, why can’t programs access this? Long awnser is they can, but they have to decompress it in order to do anything meaningful with it. This takes some cpu power of the computer to decompress it. Also there are many ways to compress a file, if you want to support every type of compression you’re making your program bloated which can lead to extra bugs.
Relying on just plain uncompressed data is faster and simpler for both your pc and the programmers behind the software.
The best analogy i have to zipping is Stenography; Writing in short hand.
“cnyugtsttma5” is not english. But it can be transformed to English if you know the code used to write it
“cnyugtsttma5” > “cn yu gt st tm a5” > “can you go to the store tomorrow at 5pm”
“cnyugtsttma5” is the zipped message, “can you go to the store tomorrow at 5pm” is unzipped
Latest Answers