Eli5-Video Compression. How Do You Get 138 Minutes Of 1080HD Video Using Only 1.2Gb?

172 views

[Edit: Removed redundant commentary about headline.]

In: Mathematics

By only storing information when a pixel changes.

For example, a lyric video on YouTube might have a background that stays constant throughout the video, so it’s rather pointless to store it 24 times per second when you can just do it one and then only store individual pixel changes as they happen from frame to frame, like the text portion changing, while most of the pixels never change.

Similarly, movies have a lot of pixels that never change, such scenes where only actors move. Thus you only record their movement, leave the rest alone until a scene change.

Also this process is the reason why, when it bugs out, you can sometimes see moving faces in those noisy, multicoloured images.

There’s a lot of different ways video can be compressed. The simplest is by “grouping” pixels of similar colors. If I’ve got a red barn made up of 1,000 individual pixels, I can probably shrink the filesize by grouping 5 pixels into a single piece of data. This has some loss of quality, but if the grouping is only done in large sections of the same color it wouldn’t be super noticeable. This is largely what JPEG files do to pictures.

One could also decrease the frame rate. This is done by removing frames from a video; while this loses quality, if the video isn’t particularly fast it shouldn’t be particularly noticeable.

More modern algorithms can use predictive algorithms. If I take a picture of your face frowning and a picture of your face smiling, a decent algorithm could probably predict how you got there. This is done by the use of “motion vectors.” When you’re changing faces, a lot of your body doesn’t move very much. If I record *only the changes in the frame,* I can encode a lot less data because your body isn’t moving. This means that each frame after the first one contains a lot less information, because not the entire image changes.

So, first a bit of math. 138 minutes of 1080p HD video at 1.2 gigabytes would be around 1 megabit per second of data (note bits vs bytes). That’s quite low. I would not want to watch, say, a hockey or football game at that bit rate.

Which brings us to the secret sauce: motion compensation. If you look at each individual frame of the video and imagine it as an image on a web site, yes the video would be absurdly large or have horrible image quality at these file sizes. However what you must realize is that a video is many many images in a row. In fact if you took 2 consecutive frames from a video, they will be very similar to each other. Most video playback is done by retaining the last few frames of the video, and the data received makes use of that information. Having a working image to start with and then modifying it for the next frame consumes *far* less file space than starting from a blank canvas and drawing a new picture from scratch.

Which is why I bring up sports games. More motion over more of the image means more data is needed for the video to describe the changes, and sports tends to involve a lot of stuff moving and big camera sweeps and it takes data to describe all those image updates. On the other hand a video of a person sitting there looking at a camera (eg: the news) has much less motion and so fewer bits are needed to describe the changes to the picture.

Of course, if you don’t mind a poor quality image, you can just blur the heck out of the image. This makes the image description much simpler since it’s just colour blobs rather than intricate details. When the bit rate is set too low for encoding this is what can happen. Extreme amounts of motion – eg confetti dropping from the sky where each individual piece tumbles a bit differently – can ruin the image quality as any limit on how big the file can be is a problem when there’s *so much stuff moving independently at once* that it can’t be processed reasonably.

Human vision is actually a lot worse than we think. Video compression gets rid of details in the video that we aren’t likely to notice, and is able to shrink down the file size enormously that way.

For instance, we are much better at seeing the colors red and green than we are at seeing blue. If you watch just the blue color data from a compressed movie, you will find it is very blurry and low resolution, while the green and red data is more detailed.

Another example is that humans are much better at seeing light vs. dark rather than different shades of color. Compressed video retains a lot of information on the light/dark in the picture, while it minimizes the amount of color information stored.

Lastly, you can use tricks like only storing data of parts of the image that change. If the background remains the same, you only have to store that data once. This only works for scenes with small amounts of motion. If you watch heavily compressed videos, you will see that still scenes of people talking are often much more detailed than blurry action scenes where the whole screen is in motion.