If digital data is stored in 0s & 1s, how does the reader know how many of the digits to take into consideration?

924 views

Must be a very basic and dumb question. But ‘1001’ can be 9 and also 2 & 1 if ’10’ & ’01’ is taken seperately. I’m confused.

In: 49

48 Answers

Anonymous 0 Comments

You need to know the type of data you are dealing with. For example, if you want to open a .wav file, you find the specification (https://ccrma.stanford.edu/courses/422-winter-2014/projects/WaveFormat/) and then you write your program to the specification.

It says first 4 bytes are the ID, then next 4 bytes are the size, then next 4 are the format… etc. etc. etc.

If somebody just hands you a blob of data and tells you to interpret it, then you are correct to be confused. You’d have no idea what the bytes mean.

Also, if you open a file in the wrong program, it interprets the bytes in the wrong way and you just get nonsense. Open a .exe file in notepad and it’s just crazy characters all over the screen.

Anonymous 0 Comments

You would write in known lengths such as “each number will be 8 bits,” as well as extra numbers here and there that might say things like “the first number X is how long the list is, the next X numbers are the list, the number Y after that is how many letters there are, followed by Y number of letters.”

The programmer gets to determine all of these things and make up the rules. It’s what makes things like reverse engineering file formats difficult, since the file could be laid out in any format.

If you want to see this being done in real time, check out the [Metroid Prime Modding Discord](https://discord.gg/AMBVFuf). They’ve been reverse engineering the original GameCube game for years, and recently the remastered dropped so they’re currently in the process of tearing that apart and figuring out how the data is laid out so they can read it.

Anonymous 0 Comments

The most accurate short answer is “it depends.”

At the processor level, everything is standard lengths and all the interpretation is physically wired into the chip. As an example, many ARM processors (used mostly in phones and such) operate with 32-bit long instructions. A specific part of those 32 bits contains what’s called an opcode, which tells the processor how to interpret the rest of the bits.

At the programming level, you need some way to keep track of what format each piece of data is in. If you’re programming in assembly (the lowest level language), it’s up to you and you alone to make sure everything is being read properly. In something like Java, the language makes you to choose what type of data a variable is and then keeps track of it for you. In something like Python, the interpreter automatically assigns and keeps track of it without you having to do anything.

At the file level, the program you’re feeding the data will try to read the file based on its extension. Most file types also have a “header” which is basically a special part at the start of the file that tells you about how to read it. For example, a text file will have a header that tells you which encoding it’s using, which lets the program know things like how many bits there are per letter, and which patterns mean which letters.

Anonymous 0 Comments

Bytes are usually organized into words which can be multiple bytes and are the basic unit handled by computers (primary width of the registers used by the CPU usually). The computer itself just performs the requested operation on the word whether that is some arithmetic, logical, store, rotation, shift. The computer does NOT care what the data represents it it just does what it’s told.

Interpretation of the data is left up the the software. I (or my compiler) will frequently stuff multiple items within a single word. I do a lot of microcontroller stuff and we are very limited on the amount of program and data memory available. My code will know that my data is located in bits 4 through 8 of the word–because I wrote the code and designed it that way. To access this data I need to do extra operations like shifting the word 4 bits to the right and then masking (setting to zero) all the bits 4 and greater of the word. This leaves me with the data of bits four through eight.

In the example above I’ve reduced the required data memory by packing the data into just the required bits; however, I’ve slowed down my code–it requires extra operations to access the data. On modern computers, the memory is essentially limitless and you’d never really bother to pack the data. Speed is more important so you’d just put your 4-bits of data in its own word and waste the unused bits. (I’m talking simple program data/variables–if you’re doing movies or something you will likely compress the hell out if it).

Anonymous 0 Comments

The most accurate short answer is “it depends.”

At the processor level, everything is standard lengths and all the interpretation is physically wired into the chip. As an example, many ARM processors (used mostly in phones and such) operate with 32-bit long instructions. A specific part of those 32 bits contains what’s called an opcode, which tells the processor how to interpret the rest of the bits.

At the programming level, you need some way to keep track of what format each piece of data is in. If you’re programming in assembly (the lowest level language), it’s up to you and you alone to make sure everything is being read properly. In something like Java, the language makes you to choose what type of data a variable is and then keeps track of it for you. In something like Python, the interpreter automatically assigns and keeps track of it without you having to do anything.

At the file level, the program you’re feeding the data will try to read the file based on its extension. Most file types also have a “header” which is basically a special part at the start of the file that tells you about how to read it. For example, a text file will have a header that tells you which encoding it’s using, which lets the program know things like how many bits there are per letter, and which patterns mean which letters.

Anonymous 0 Comments

You would write in known lengths such as “each number will be 8 bits,” as well as extra numbers here and there that might say things like “the first number X is how long the list is, the next X numbers are the list, the number Y after that is how many letters there are, followed by Y number of letters.”

The programmer gets to determine all of these things and make up the rules. It’s what makes things like reverse engineering file formats difficult, since the file could be laid out in any format.

If you want to see this being done in real time, check out the [Metroid Prime Modding Discord](https://discord.gg/AMBVFuf). They’ve been reverse engineering the original GameCube game for years, and recently the remastered dropped so they’re currently in the process of tearing that apart and figuring out how the data is laid out so they can read it.

Anonymous 0 Comments

You would write in known lengths such as “each number will be 8 bits,” as well as extra numbers here and there that might say things like “the first number X is how long the list is, the next X numbers are the list, the number Y after that is how many letters there are, followed by Y number of letters.”

The programmer gets to determine all of these things and make up the rules. It’s what makes things like reverse engineering file formats difficult, since the file could be laid out in any format.

If you want to see this being done in real time, check out the [Metroid Prime Modding Discord](https://discord.gg/AMBVFuf). They’ve been reverse engineering the original GameCube game for years, and recently the remastered dropped so they’re currently in the process of tearing that apart and figuring out how the data is laid out so they can read it.

Anonymous 0 Comments

Bytes are usually organized into words which can be multiple bytes and are the basic unit handled by computers (primary width of the registers used by the CPU usually). The computer itself just performs the requested operation on the word whether that is some arithmetic, logical, store, rotation, shift. The computer does NOT care what the data represents it it just does what it’s told.

Interpretation of the data is left up the the software. I (or my compiler) will frequently stuff multiple items within a single word. I do a lot of microcontroller stuff and we are very limited on the amount of program and data memory available. My code will know that my data is located in bits 4 through 8 of the word–because I wrote the code and designed it that way. To access this data I need to do extra operations like shifting the word 4 bits to the right and then masking (setting to zero) all the bits 4 and greater of the word. This leaves me with the data of bits four through eight.

In the example above I’ve reduced the required data memory by packing the data into just the required bits; however, I’ve slowed down my code–it requires extra operations to access the data. On modern computers, the memory is essentially limitless and you’d never really bother to pack the data. Speed is more important so you’d just put your 4-bits of data in its own word and waste the unused bits. (I’m talking simple program data/variables–if you’re doing movies or something you will likely compress the hell out if it).

Anonymous 0 Comments

The most accurate short answer is “it depends.”

At the processor level, everything is standard lengths and all the interpretation is physically wired into the chip. As an example, many ARM processors (used mostly in phones and such) operate with 32-bit long instructions. A specific part of those 32 bits contains what’s called an opcode, which tells the processor how to interpret the rest of the bits.

At the programming level, you need some way to keep track of what format each piece of data is in. If you’re programming in assembly (the lowest level language), it’s up to you and you alone to make sure everything is being read properly. In something like Java, the language makes you to choose what type of data a variable is and then keeps track of it for you. In something like Python, the interpreter automatically assigns and keeps track of it without you having to do anything.

At the file level, the program you’re feeding the data will try to read the file based on its extension. Most file types also have a “header” which is basically a special part at the start of the file that tells you about how to read it. For example, a text file will have a header that tells you which encoding it’s using, which lets the program know things like how many bits there are per letter, and which patterns mean which letters.

Anonymous 0 Comments

That problem is not specific to computers. 123 456 can be either one single number or 123 & 456 taken separately. Heck, negro can be dark if you read in Portuguese or black if you read in Spanish. You will know which one is the right one by using context.

In the digital world it’s up to the software to decide what 1001 means, based on context. That’s why if you open a png file with mspaint you see a picture but if you open it with notepad you see gibberish