Why does audio and video take so much storage?

362 views

So for example, videogames are super heavy on disk space. And aparently most of that space is just the sounds, textures, models, etc. But the code takes very little space.

Why can something as complex as a physics system weight less than a bunch of images?

Code takes very little space, media takes more. But an image is just code that tells the computer how to draw something (I think)

So how come some code gets to be so small in size and some code doesn’t?

In: 53

18 Answers

Anonymous 0 Comments

I have an image (a texture) in front of me. One pixel is green. What colour are the ones around it?

The bottom line is – you don’t, and can’t, know. Sure, I told you it’s an image, so there’s a good statistical probability they will be green as well – but you don’t know that. The only way to be sure is to look. And when it comes to storing it on a drive – sure, there are compression techniques you can use to cut it down in size somewhat – but in the end, you’re effectively storing something akin to random information. That takes up a LOT of space. And sound is basically just the same.

Anonymous 0 Comments

Each letter and number can be stored at one byte and then the machine takes that byte and turns it into a character.

A=60
B=61
C=62 etc

So although the letter A has shape, that shape is not stored, only the number 60 is stored. Your computer reads the 60 and knows how to make that shape on the screen.

A photo of that shape has to represent every pixel. The shape is divided into a 100×100 grid and then each of the pixels has a colour.

Anonymous 0 Comments

Your whole questions revolves around the idea of information and how compression applies to it. Its not why is it so small, but why does it take so much information

Code is small, and gets even smaller when you compile it down. The entire physics system for your game might only take a few megabytes of code and that’s because most of the information in the code is lost during a compile. Stuff like comments will get dropped, if statements turned into a few bytes of memory, code optimization, etc. We have taken the code with all of the information a human needs and turned it into something only a computer can read with all of the human bits removed. We can do this because the human will never need to read the human bits from the code again. This would be “lossless” compression because all of the information the computer needs is kept. All of the code is still there, just represented in a much smaller form that with enough effort, can be turned back into something human readable.

Now apply this to an image. If you remove information from the image to save space, there is no method you can use to regain that information because it is lost. This is called lossy compression and can be used to save alot of space but you can only take it so far. If you keep taking information out of the image, you have less information to represent the image. Nobody wants an image with all the blue removed to save space. Now the more information (aka detail) you want to keep, the more space it takes to store that information and images have ***alot of detail***. If you have ever seen a highly compressed image, you can see all the little shortcuts it takes to save space. Color space will be reduced resulting in banding in like-colored areas, sharp areas will be “blurred” as individual detail is removed for more grouped detail, etc

Now apply this to a movie file which is just several pictures lined up after each other to form a moving image. A 10 second clip at 30fps is gonna take 300 pictures worth of space to store that movie file. Now with compression, you can get the space down a fair bit, but you can only compress it so far before you start loosing too much information.

Now lets watch the progression of information over time in our ever greater push for more detail. In the 80s/90s, you had the advent of the personal computer and “writing” machines. Their biggest feature was the ability to type documents and go back to modify them without the use of items like whiteout. Decade or two later, as the computers get more power, their main feature is now longer the ability to write documents, but to display images. Writing documents takes a fraction of the power it used to thanks to advancements in technology. Another decade or two later, and now the main feature is no longer writing or pictures, but movies. Writing documents takes another fraction of power while images only take a fraction. As time progresses, manufactures push for greater detail and power to gain an edge over their competition but that greater detail also requires a substantial more information to represent all that detail.

Anonymous 0 Comments

For the same reason that a recipe doesn’t take up the same space as the 5 course meal it helps you prepare

Anonymous 0 Comments

Code is like looking at a picture, and describing it to someone the way you normally would – you may use a fair number of words to describe it, but most of the work is done by the brain of the person, interpreting those words to build an image in their head.

An image is like looking at a picture and describing each individual pixel, in terms of color and position. It would take *a lot* more words just to do that. You might naturally describe a small handful of major elements in a picture, and then a few dozen smaller details to really flesh an image out. Let’s go nuts and say 100 details, large and small, and each one gets 2 dozen words of description. You just used 2400 words to really deeply describe a picture.

A 4k image, just one still image, is 8.3 million pixels. To describe color and position, you’ll need at least 3 words per pixel, realistically more like 6-10 if you don’t have some sort of encoding schema for what you’re saying. So, somewhere between 25 – 75 million words, to say what you just said with 2400 words when describing the picture naturally.

The metaphor is a bit weird, but the basic idea is that code has a lot of efficiencies that allow it to do a lot with just a few “commands”. That’s the power of loops. And no matter how much you compress audio and images, at the end of the day it needs to be uncompressed and produce something extremely close to the original. It’s gonna need to serve 8.3 million pixels to your eyeball.

So, let’s talk about what kinda code you’d need to write to take that 4k resolution image and display it on your screen, so you can see it. This is a reductive example, you’d never write a program like this because many of these problems are already “solved” by the operating system and the game developer gets to just use the tools provided by the OS “for free” to do a lot of the heavy lifting for things like sending an image to your monitor, but we’ll roll with it. That’s another reason the code gets to be small – it doesn’t have to literally run your whole computer, it just has to run itself.

So, ignoring all of that, you’ve got a 4k image on your hard drive and you’re writing a program to display it on the screen. The image is uncompressed, 8.3 million pixels that all have their own location and color, and a little bit extra to define what kind of image it is, when it was created, what’s the filename, etc.

(Basically) all you have to write to get that to the screen is:

* set position to 1, which corresponds to the first pixel

* read pixel data

* send to corresponding pixel on the monitor

* increment position by 1

* check to see if there’s any more data

* if more data, go back to the first step

* if there’s not, you’re done

And there you have it. Small, but it displays the whole image for you. It runs 8.3 million times, but you don’t have to write each and every one of those interactions, and it doesn’t live as 8.3 million interactions on your hard drive, it lives like a machine code version of just what you see here.

Anonymous 0 Comments

Describe me the measurements of a table. Now the measurements of an end table. Coffee table.

Now describe the texture of the wood grain for each side of the table. For each leg. For the underside.

Now describe the texture of the end table and the coffee table.

If the table breaks in half what color is the exposed edge? What color are the fragments.

The physics model is just the measurements of the world but you need the descriptions of the textures to make it look good, and that’s what takes space even for a simple table.

Anonymous 0 Comments

Some good answers here, so I’ll keep mine simple and clear up a bit of confusion I noticed in your question.

Images are not “code that tells the computer how to draw something.” Code that tells the computer how to draw something is called “procedural drawing/generation,” and usually does take up very little storage space (one of the main benefits of procedural generation is that you don’t have to store the generated content, but the downside is that you have to take the time/processing to regenerate it every single time you load the game).

Images, music, models, map data, etc. are all just “data.” An image would be a massive list of colors such as “red, red, blue, light blue, light blue, red, red…,” for every pixel in the image, a song would be a massive list of numbers specifying different sound amplitudes, models are giant lists of 3D coordinates, and map data would be massive lists of object names, world positions, references to level scripting, etc. Computers have zero idea what to do with pure data, so we then write code as part of the game “engine,” that knows how to read that data and make things happen. We can make it smaller by compressing it, but we can only compress it so far without throwing away important data, i.e. the textures become blurry, sound becomes crackly, and models and levels start to become wonky.

Anonymous 0 Comments

“But an image is just code that tells the computer how to draw something (I think)”

Here’s the issue – the images and sounds that take a lot of space, aren’t that.

There are indeed images that are just “code” that the computer uses to draw them, and sounds that are just “code” that tells the computer what to play. Those are, indeed, pretty small. You can see this with vector images and midi files – they are small.

But “real” images, like photographs, and “real” sounds, like a recording of a voice, aren’t that simple – you can’t just have an instruction that tells the computer how to make it, because there’s no simple instruction, or set of instructions that can do that. They aren’t “regular”, for lack of a better word. It’s very easy to draw a line following instructions, you just need to say where point A and point B are, and the thickness. It’s nearly impossible to draw a realistic photograph using just instructions like that.

So these complex images and sounds have to be stored, in the case of images pixel by pixel (with each pixel taking a certain amount of bits to determine its color), and in the case of sound by samples (same idea). Video is just a series of images, so pick an image and multiply that by 30 per second, or whatever the framerate is.

Then you have compression algorithms that make it smaller, but there’s a limit to it, and there’s a heavy performance hit as well – the better the compression, usually the heavier it is on the cpu as well.

In the case of video, these compression algorithms are a requirement, since otherwise it would just be too big to work with. 99% of people never watch uncompressed video, unless you work in high end production, you never even get to see it. I’m a video editor, work on video all day long, and uncompressed is just not a thing in my workflow. To give you an idea, a simple 1080p (1920*1080 pixels) 24 bit uncompressed 60 fps video is around 3Gb PER SECOND. 24 bits for each pixel, times 2073600 pixels per frame, times 60 frames per second.

Working on a production codec, like Pro-res 422, which I use a lot, you’ll get around 150Mb per second for the same file, with next to no visual quality loss. Compressing that in h.265, which is what would be used for final distribution (youtube, etc.) will get you to about 9Mb per second – at the cost of processing power needed to decode it back on the fly, and a loss of visual accuracy (that is, usually, not very noticeable).

Taking that into account, I’d say video (and images and sound) take a surprisingly LOW amount of storage. The compression algo’s we use are incredible, you should expect media to take a LOT more space than it usually does.

Anonymous 0 Comments

You don’t have to tell a computer how to draw every ray, or place every plane, or calculate every collision.

You just have to tell it one time. It can repeat the same process for every similar action.

But you do have to tell it the color of every pixel in an image, every vector in a model, every sound in a recording. Those items are all individually unique and harder to generalize.

Computers and gaming consoles also have a lot of code already. The game code can piggyback off the existing functionality of the device code, in place of having all new code running entirely from scratch.

Like how a lot of online games and such will require you to download or update Java. There’s a lot of generic code not actually included in the game, but it is still used to make the game work.

Anonymous 0 Comments

Because they’re a LOT of data.

This is a simplified example.

Imagine for example that you have a single pixel. A dot.

For this single dot we need to know the color. So you have the Red, Green, and Blue (RGB) values. each one on a scale of 0-255. A single byte can hold a number up to 255. So for each single dot, we are using three bytes of space.

Now let’s look at a single picture. The standard video size is 1920×1080 (1080p.)

1,920*1,080*3 = 6,220,800 Or around 6 Megabytes for a single frame of video.

Now consider that a video is generally 24 frames per second:
6,220,800*24 = 149,299,200 or roughly 150MB per second for HD video.

We make these much smaller of course with compression, which would probably be a whole other ELI5.