What are compressed and uncompressed files, how does it all work and why compressed files take less storage?

926 views

What are compressed and uncompressed files, how does it all work and why compressed files take less storage?

In: Technology

27 Answers

Anonymous 0 Comments

To a computer,

“word”

Is

120 111 114 100
(http://www.asciitable.com/mobile/)

So, ask yourself if You could write that shorter? Maybe you think, start with 120, subtract 9 for the next one, add 3, and then subtract 14.

120 -9 3 -14

That’s less info than the first way to store “word”.

It gets better if you have to store:

“wordwordwordword”

Because you could just say that you want to reiterate the last 4 letters, 4 times. So you’d get

120 -9 3 -14 4 4

That’s compression. Come up with an idea that uses less bytes, but says the same thing.

Anonymous 0 Comments

If you are interested in a more in depth answer and a better answer than I could give I would strongly suggest the Great Courses lecture series The Science of Information: From Language to Black Holes by Benjamin Schumacher.

Compression is covered in chapter/lecture 5 but the whole course is excellent. It covers some very technical topics while remaining very approachable.

It is available as an audio book as well as a video serries.

Anonymous 0 Comments

I have actually implemented file compression technology, so I feel particularly authorized to answer this question.

Software needs to see files whose contents it understands. This is why software authors design file formats to be optimized to the particular needs of the particular problem their software is designed to solve, be the files written documents, audio/video recordings, spreadsheets, executable programs, scripts, etc.

These needs do not necessary take into consideration file storage resources. So, when a user’s file storage space is filling up, it’s often in their interests to find ways to store **the exact same data** in a **smaller space**. That’s what a data compression file format does. It is a way of analyzing the content of a file, identifying *self-similar parts* of that file (that’s important), and recoding the file to take advantage of the fact that it can reduce the redundancy within the file to be able to store the content of the file in its own, compressed file format, which takes up less space, which is the whole point. Disk storage is not the only place where data compression is useful. Network transmittal of data benefits in taking less bandwidth and/or less time to transfer data from one place to another, if the data is compressed at one end and decompressed at the other.

This, of course renders the data into a file format that the software which originally understood the file’s contents no longer understands. This is the reason compressed files are given new filename extensions, so that even at the File System level, it becomes obvious that the contents of a file are compressed and so no one, human or software, makes the mistake of trying to operate upon that file’s contents as if they were only encoded in the original, inner file format.

Sometimes, this can be handled at the File System level, wherein the software responsible for reading data from or writing data to the actual storage media is the layer of software that takes responsibility for compressing the file’s contents on write, and uncompressing the file’s contents on read, which has the benefit that the file can be stored in its compressed state, consuming less space, while the original software is free to consume the file’s contents, seeing only the file format that it expects.

Often, software will expect its files to be compressed by external programs and so it can be architected to allow itself to be configured to detect compressed input and transparently pass the file through the appropriate decompresser program before trying to use the file’s contents.

Because one of the goals of compression is to reduce the redundancy of the encoded data, the compressed results have less redundancy to begin with, and so it’s not possible to compress already compressed data to get the file even smaller. In fact, trying to compress already compressed data will often result in a doubly compressed file that’s larger than the singly compressed file. This is due to the compression file format’s meta data overhead, as well as other factors. This is often true even when two different compression schemes are used in tandem, not just reapplying the same compression scheme multiple times.

Some file formats, for example audio/video recordings, are already encoded in some manner of compressed form. These are often “lossy” compression standards, such as JPEG or MP3, that explicitly throws away some data in order to make the image or video or audio appear identical when consumed by a human, while also rendering the data into a form that is more amenable to compression. It’s fine to recode a “lossless” audio file to a lossy one, if the human ear will not be able to tell the difference between the playback of the lossy and the lossless encodings. Other data types, for instance executable program code, would not be amenable to lossy data compression, since actually changing the details of the instructions in the program would likely be fatal to the execution of the resultant compressed-decompressed program.

For such lossless data compression schemes, it is paramount that the round-trip conversion of <original data> (compression) <compressed data> (decompression) <uncompressed data> give the result that <original data> and <uncompressed data> be bit-for-bit identical.

There are many different compression schemes at work in the world. Each one does what it does in slightly different ways. It is impossible to create a single compression scheme that works equally well on all kinds of data. The compression scheme at work in MP3 files is actually so specialized that it’s covered by a patent owned by the Fraunhoffer Institute. However, as adept as the compression scheme in MP3s is at compressing audio data, it would not work nearly as well for spreadsheets or written documents. Likewise, the kind of compression schemes that might work well on written documents would work very poorly for video streams. The diverse needs of different types of data and the continual research and development of computer algorithms insures that there will always be a new file compression extension to learn sooner rather than later.

Anonymous 0 Comments

Imagine you have a graph of 100 points. To know the position of each point in X& y you’d need 2 bytes each or 200 bytes. However you could come up with a mathematical formula that when run could make a line that nearly matches all the points, it’ll be a little off, but now instead of 200 bytes you only need 20 to store the formula that at intervals of the X axis outputs the required y axis for the points.

Compressing images is even simpler. A raw image uses 8 bits (or a byte) per colour Chanel per pixel to define the brightness and colour of the pixke as a whole, the subpixle is set from 0 (off) to 256 (on) with in-between being increasing brightness. This is an 8 bit colour (2x2x2x2x2x2x2x2 = 256) there are several options for compressing, for starters you could reduce the number of bits per subpixle. Or reduce the resolution by merging nearby pixles together. You could throw away every other pixle and then interpolate that colour during unpacking based off it’s neighbours

Anonymous 0 Comments

many great answers already, but i throw in how i got it explained when i was young. very simplistic, but here goes:

say you have the text string “aaaaaaaaaabbbbbbbbbbbbbbbbbbbb”

10 a’s, 20 b’s.

instead of saving it like that, when compressing, you can write it like “10a20b”

when those 10 a’s and 20 b’s appear multiple times in several locations, you can have the first instance shortened like above, and for the other appearances, you can make an “alias”. l like “string1”. doesn’t make sense in this example because string1 is longer than 10a20b, but if that string was longer, it would save space to use an alias that’s shorter and to refer to they original string

Anonymous 0 Comments

A compressed file is a type of file that has undergone a process to reduce the size of its overall data. When the data in a file is compressed, it takes less storage space, especially when it comes to digital e- files that are sent over computer networks like the internet or an intranet. Compressed data also takes less memory space for processing.

Uncompressed file – is a file that has been allowed to use all the space it needs. Usually files are compressed in order to save disk space, but usually this process reduces the quality of the original file.

Anonymous 0 Comments

You know those waterproof ponchos you can get, the ones that come in a little compact pouch, that’s a compressed file but when it rains you open the thing up (ie only when needed) and put it on.

When in the pouch is takes up a tiny amount of space in your bag – if you don’t compress it (ie scrunch it back up and stuff it in the pouch) and just shove it back in your bag as a poncho it takes up more room in your bag.