I know this is ultra general, but Ive always wondered how pictures or sound is written to a drive. I understand slightly that data is stored as little bits of sequences of on or off switches, but if you were to write out a drive in english what would that look like for pictures. Does each pixel have a specific code like the #F000F4 or however you see them online? Hope there can be some sort of minor explanation. Thanks.
In: Technology
They are saved as a list of pixels wich is 1 byte per RGB colour. So basically binary numbers for 255 255 255 would be a white pixel while 0 0 0 is a black pixel.
You wouldn’t notice it’s a pic when looking at the raw data. The computer only knows to interpret it as a picture because of the file ending (.bmp in my example case) so you’d rather look for that information in the metadata.
Everything on a computer is stored binary. 1’s and 0’s. So if you were to look up a completely raw file without the computer converting it into something readable for you, it would look something like this:
1110101010101101101110101010110111010101100101000101001011010110010101101101011010010101111010101010100010101011110110101100001011010101011011011011
etc.
It would not be in English. And there is also no agreed upon way to write it out as you could “view” the file contents in multiple ways such as looking at it binary, or hexadecimal etc.
But there really is no “file”. You are right that it is stored as a series of on and off switches. These are stored voltages. There are no actual 1s or 0s being stored inside the computer. 1 and 0 is just an abstraction for high voltage and low voltage.
When you view the file in binary or hexadecimal, you are looking at a monitor which has been fed different voltages via the hdmi or monitor cable from the video card, which got a series of voltages from the cpu or memory, which in turn got the stored voltages from the location (also stored voltage) of said file. The monitor interprets the voltages and turns the pixels different colors.
It’s very interesting to think of the computer in this way. It’s voltages all the way down.
The exact details of how it’s written depends on the file format (hence why there’s different ways of saving images, like as a .png or .jpg), but essentially no matter what you’re saving (whether it’s a picture, or a sound, or plain text), you have to represent it as a series of numbers, then translate those numbers to binary and store those in the “on/off switches.” Because we use standardized file formats, someone else can then translate those numbers back into what they originally were.
>Does each pixel have a specific code like the #F000F4 or however you see them online?
You’re not far off here. The standard way of representing a color is with three numbers between 0 and 255 to represent how red, blue, and green the color is. When you see something like #F000F4, that’s actually all three of those numbers written in a shorter way using hexadecimal, which is a base-16 number system. #F000F4 translates to 240 (from the F0) red, 0 (from the 00) green, 244 (from the F4) blue.
There are a lot more details that go into file formats, like compression (because ideally you don’t want to store those three numbers for every single pixel if you can help it), but that’s the conceptual basis. You store the numbers that let the program reading it recreate the color of each pixel.
Yes, each pixel has a code. Let’s say the image is 1080p, so 1920 pixels wide and 1080 pixelz high. That’s about 2 million pixels. Now, if this was a black and white image, you could save it as two million numbers. Say 0-100, with zero being black and 100 white. A string of two million numbers between 0 and 100 would give you a 1080p black and white image. You’d just need to agree on an order, like say right to left, top to bottom. Now you can share a long string of numbers with someone, and they can decode a picture. Could do it by hand even, though a computer is going to be a lot faster at it.
If you want a colour image, we use RGB. Red green blue. So now we need three number for each pixel. One for the brightness of red, one for green, and one for blue. So 05-78-100 would give you some sort of colour. High on the blue and green, low on the red. So a cyan or aqua or something. A two million string of those triplets gives a 1080p colour image.
But computers work in binary. 0-100 is a bad scale. 0 to 1 would work, but that’s going to give us a very poor range. Only two levels. If we use two binary digits, that’s four levels. 00, 01, 10, 11. Aka, 0, 1, 2, 3. If we use three binary digits, that’s eight levels. If we use 8 binary digits, that’s 256 levels. That’s a decent scale. So every pixel is now 01010101-11110000-11101011 or something like that, and there’s still two million of them.
Except that’s hard as hell to write out. Enter hexadecimal. As four binary digits has 16 options, a new number system with 16 options would sure make that a lot shorter to write out. We have decimal, but decimal (10) is not a factor of 2 of two doesn’t play nice. 0001 is 1 in binary, so in hexadecimal that would be 1. 0100 is 4, so that would be 4 in hexadecimal. 1001 is 9, so that would be 9. But 1010 would be 10, and that’s a problem. 10 doesn’t have its own number in decimal (what we use), so let’s call that A. B is 11. F is 15.
Two hexadecimal numbers next to each other is 16×16. So two of these hexadecimal numbers is 256 options. So two numbers can replace 8 binary, it’s just a shorter way to write it. FF is 256. 00 is 0. So your example of F000F4 is the colour 15×16 + 0, 0x16 + 0, 15×16 + 4. So 240, 0, 244. In other words, 240/255 red brightness, 0/255 green brightness, and 244/255 blue brightness. In other words, that’s a bright purple pixel. String two million of those together, and you got a 1080p image in binary.
As for audio, what is sound? Well, it’s air pressure changing fast over time as a wave. We can measure air pressure and a wave, and we can just measure it at a few points in time and pull off some numbers. Let’s use those same 8 binary digits again that give us 0 to 255. 255 is high air pressure, 0 is low. Now let’s make a string of those numbers, say 44,000 long. And say every 44 of these, it goes from 0 to 255 and back to 0. Now, let’s say we hooked this up to a speaker, and made the speakers position run through this list of numbers, 44,000 times per second. Every 44 numbers the speaker is going from max in it max out and back, and we’re going through 44,000 numbers per second. What’s that mean? Well, speaker position goes in and out 1000 timers per second. What’s that mean? We just made a 1000 Hz tone. Why 44,000? That’s what a CD uses, and 44,000 was chosen as it is slightly over double the 20,000 Hz human hearing range, meaning it can do sounds up to human hearing. You need numbers, or samples, at twice the sound frequency to store them. So that’s how sound is stored in binary, just a lot of numbers recording the sounds wave height thousands of times per second.
This all sounds like a lot of numbers, especially for something like a movie. Yes, it is. It would be absolutely massive. A HD movie should fill up your entire drive. But it doesn’t, thanks to compression. You can take everything I’ve said, and then make it smaller through tricks. A small mp3 file might cut our high frequencies you don’t care about. A video file doesn’t need a new complete image every frame, just needs to know what pixels to change from the last frame. An image that is all white doesn’t really need 2 million pixels, it just need one and the instructions to repeat it for all of them. That’s what a JPEG does, groups similar colours regions into on blob to save space, and you can see this quite easily on a jpeg image.
> Does each pixel have a specific code like the #F000F4 or however you see them online?
Yes, though inside the computer that “F000F4” is stored in binary, which would be 111100000000000011110100. That’s one pixel. You would then follow it with the next pixel and so on until the whole picture is represented.
In reality, there’s extra information. You’ll typically have some sort of header for the file, which contains information that tells the computer how to interpret all the data that makes up an image. This could include things like file type, size, any compression information, etc. This is all also done in binary.
The other consideration is that directly representing each pixel of the image as above only occurs in lossless formats. The most commonly used image formats nowadays use some sort of compression, which changes how the data is represented. Exactly how it’s done depends on the compression used.
A very, very rough example might be that any time there’s a bunch of the same colored pixel in a row, the file would have the amount of pixels in the row and the color value of the pixels. It would be the binary equivalent of “there’s 12 red pixels in a row here”.
If you’re not already familiar with how it works, that hex code you used (#F000F4) is easy to “translate into English”. It’s three 2-digit numbers just stuck together: F0-00-F4. In binary that’s 11110000-00000000-11110100. The first chunk is how much red there is in the pixel, the second is how much green, and the third is how much blue. So you can look at it and realize it’s a bunch of red and blue, and not a lot of green, so it should be some sort of purple-ish color.
Latest Answers