The vast majority of programming does almost nothing with binary. The programming language lets you use defined syntax to do all of the operations you need to do. Then there is the compiler which converts the normal, mostly easy to read syntax into machine code, which even there, will mostly use a hexadecimal representation (base 16) of the data. They use hexadecimal because it is base 16, which means 4 binary digits can fit in one hexadecimal digit. So a byte, which is 8 binary bits, can be two hexadecimal digits. So for example the binary 10110100 could be written as can just break that into 1011 and 0100 which is B4 in hexadecimal.
As for how to get strings (ie letters and symbols) out of pure 1s and 0s, the standard for that is either ASCII or for richer character possibilities, UTF-8 or UTF8.
ASCII uses a single byte (8 bits or 8 1s or 0s) and has a symbol table to say for example 109 in base 10, or 6D in hexadecimal, or 01101101 in binary means the letter “m”. With that, as long as your code knows you are looking for a string, then the machine code can grab however many bytes it needs to complete the string.
For UTF8, instead of each character being one byte, each character can be 1-4 bytes. This means that UTF8 is a little more confusing at the first byte needs to tell the machine code how many bytes there are in this character, as well as containing the actual character if it is just a single byte. With this though, a single UTF8 character could represent over 1 million different things. This means characters in every other written script can still be encoded with a single UTF8 character. So for example, the hex value 5a23 is the UTF8 representation of 娣 in Chinese.
The main takeaway though, is that unless you are writing a compiler, or writing embedded c code, you will basically never need to know that there is 1s and 0s representing everything under the covers, as the compiler takes care of that for you.
Just to add on to what others have mentioned (and lol students are the only ones forced to code in binary). The idea is that we can look at a set of bytes, interpret them in some standard way and perform actions.
To understand this a bit better, we can look at a very rudimentary computing model in which we have something called a bus (think of it as a series of parallel wires) which we fetch a set of bits from memory and pass them to another microchip. Don’t worry about how the bits get pushed onto the bus.
Now, let’s pretend this microchip has a magic way of doing different things based on the first 3 bits that were passed in (interested in this magic? You’ll want a multiplexer in a circuit). 000 might be encoded to do nothing. 001 might be encoded to take the remaining bits and store them somewhere, such as a another memory location, let’s say r1. 010 might do the exact same, but store it in r2. 011 might add the value of r1 and r2 and store the result in r1! 100 might print the value of r1 to the screen.
So this is a toy version of how we get things done. Registers on a computer would be your r1 and r2. Interested in what these magic values are called? Look up op codes. They’re instructions that tell a computer how to fundamentally run and do things with data. (You can search for the x86_64 instruction set if you want specific op codes).
The fundamental op codes tell computers how to manipulate bits. We can store, add, divide, write and read to memory based on bits now. Turns out you can get a lot done with that. (And many more operations!)
So all good, we can perform basic math and using our power of writing and reading to/from specific memory locations we can actually start to interpret our bits as other forms.
At this point we have our compiler/linker take our text file and generate a bunch of bits that map to what we’re trying to do. So if I have something in my program called age:i32. Well I logically know it represents an age and can start treating it as an age. The compiler makes sure it gets treated as an integer with 32 bits and handles planning out the appropriate machine code.
So how about a basic string? Same thing!! It’s just a bunch of bits under the hood. Sure I could use an industry standard such as ascii or utf to interpret and interact with these bytes, but theres nothing stopping me from saying im going to write a program where 0000 0001 represents the letter ‘a’. At the language level we can logically group and hide lower level operations that make writing programs easier.
If the computer is programmed correctly, all you *need* is numbers. For example:
72 69 76 76 79 32 57 79 82 76 68 33
Each of the above numbers represents a letter or character on your keyboard, in this instance I just said,
HELLO WORLD!
using numbers (ASCII decimal values, in this case) instead of letters. The computer at its most basic level (again, if we’re talking straight ASCII), however, would read them like this:
010010000100010101001100010011000100111100100000010101110100111101010010010011000100001100100001
All three of these are the exact same pieces of information, just represented in different formats. At their most basic levels, computers understand the last one, and through the Assembly Language we have to teach it to show the same information to us in a way *we* can understand.
People do not in general code in binary, it is extremely hard for humans to read.
The CPU is built so it interprets binary data as instruction in a specific way and will start to do that at a specific memory adress when powered on. It is a table where for example if the finest 8 bits are 0000 0001 it should move data from locations specified in the following bits. It not that different to if you read “turn on the oven” it is not text but induction to do a specific tasks, because you know the meaning of those letters
You could write an induction as 01001000 10001001 11100101 if you like. This is a machine, what the CPU understand. Every induction to the CPU is like this. It is build to interpret what it should do from it.
An alternative is written in hexadecimal where 4 bits are on digits from 0 to 15. 10 to 15 use A to F as digits.
In hexadecimal the same thing is 48 89 e5
That is still hard for humans to read, there are a human-readable form called assembly and it in the induction is
mov rbp, rsp
rbp and rsp are registers and the insoction moves (copies) the content in rsp into rbp. There is a 1 to 1 mapping between assembly and machine code in hex or decimal.
You could look it all up in a table but is is simpler to let a computer program do that and output it in a more human-readable form.
Assembly code to make it easier to read from humans first appeared in 1947, the first assemblers that covert it automatically is from early 1950. So even when the number of computers in the works likely could be counted on your fingers people found a way to get away from programing in binary,
Assembly is sometimes used but most of the time higher-level languages are use that convert it to machine code. This is done by a compiler.
An example is a simple hello world in C that looks like
#include <stdio.h>
int main() {
// printf() displays the string inside quotation
printf(“Hello, World!”);
return 0;
}
It can be compiled to the following assembly code
push QWORD PTR [rip+0x2fca] # 403ff0 <_GLOBAL_OFFSET_TABLE_+0x8>
jmp QWORD PTR [rip+0x2fcc] # 403ff8 <_GLOBAL_OFFSET_TABLE_+0x10>
nop DWORD PTR [rax+0x0]
main:
push rbp
mov rbp,rsp
mov edi,0x402004
mov eax,0x0
call 401030 <printf@plt>
mov eax,0x0
pop rbp
ret
If you compile the C code yourself at [https://godbolt.org/](https://godbolt.org/) you can see the machine code in Hexadecimal too.
First off, people don’t code in binary directly. The lowest level any remotely sane person works in is machine code, which is still a step above raw binary.
As for how we use binary to code in the broader sense, the full story is a bit complicated, so let’s work our way up.
Firstly, we break up a string of binary into blocks. A standard size for a block is 8 digits, known as a byte, but there are both larger and smaller blocks that are used. For our examples, we’ll stick to a byte.
You’re correct that binary can only store numbers directly. In the case of a byte, we can store anything between 00000000 to 11111111, or 0 to 255 in decimal. But, we can still use that to store text. Imagine you’re sending a message to a friend, but you can only do it via numbers. What you can do is agree beforehand that a particular number corresponds to a particular Latin character. Say, 1 through 26 is A through Z. If you receive the numbers 3, 1, and 20, you can refer to the code you agreed on and realise they’re sending CAT.
You can encode whatever you’d like onto those numbers. The alphabet, numbers, whole words, colours, whatever you can think of. We call a particular code a format, and there’s hundreds of them for all sorts of different things. We can even encode the information about different file formats so the computer knows how to interpret a particular file.
And to circle back around to the original question, one of the things we can encode is what we call instructions. Instructions basically tell a computer what to do on the hardware level. I’m not going into how *that* works because there’s not really any ELI5 way to do it. Suffice it to say, the CPU can receive a number and then based on how it’s been built/programmed, it can execute an instruction. A set of these instructions is called, surprisingly, an Instruction Set. Right now, there’s two main Instruction Sets that are used, x86 and ARM.
You could code on a particular instruction set, but that’s not common. Low level programming is usually done with Assembly, which is effectively hardware instructions, but optimised slightly for human readability.
Most software development is done on high level programming languages, however. These are much more human readable than Assembly, but don’t correspond directly to instruction sets like Assembly does. They thus need to be transformed into a format that the hardware can understand directly, which is the job of a compiler.
This is a pretty advanced question that I’m not sure I’ll be able to ELI5 but I’ll try my best.
First, people *dont* write code in binary, at least not directly. I really don’t like the colloquialism that computers only read 0s and 1s. While that’s true, it’s really missing a lot of what’s actually going on. For one, you thought programmers write code in 0s and 1s. They don’t, and if they did, the tech industry would be decades behind.
What they actually do is write code in high level languages that tries to be as human readable as possible. You can create a Person object that has an age and name attribute. You can compare Person objects by their age or name, or both, or anything you want, really.
This way, if you had a bunch of information about people, you could just create a Person object for each data entry and have whatever data associated with them. It makes it really easy to understand what’s going on.
Even though it’s supposedly the easiest, if you don’t have experience with coding I’d assume a lot of the words I just said seem like nonsense. That’s because even though it’s trying to be as readable as possible, it’s still a programming language and you won’t get away from programming concepts and paradigms.
Abstracting any further to a literal human language would lose all the meaning of programming. At the heart of it you have to tell the computer *exactly* what to do and that just isn’t possible with human language, it has to be derived from computing concepts and be molded to try to be as human as possible.
Anyway, these high level languages get “translated” into lower level languages. There’s a program called the *compiler* that takes code from language A and turns it into code for language B. These are very complicated programs, of which people spend their lives researching and working on. Compilers, along with the operating system, are some of the most complex and rigorous programs humans have created.
Again, the compiler is the program that turns your Python/Java code into code the computer understands, I.e the 0s and 1s. However there’s a couple steps in this process; we don’t go from Java directly to 0s and 1s.
Java is special because Java has an extra step most languages don’t. I won’t get into it because it’ll just needlessly complicate things.
Let’s say we’re just compiling a C code. This takes your C code and the C compiler turns it into assembly language. Then, the assembly language gets compiled (by a different compiler) into machine code, which the computer can then execute.
There are few other steps in between, but they needlessly complicate things and aren’t necessary to explain how this process works.
It’s basically just a series of steps of turning something from language A into language B. The exact process is very formal and complex, and this process has been researched extensively. The compiler is one of the most important programs as it directly interacts with any other program written in that language.
Not just that, but the C compiler has to be backward compatible. There’s a *lot* of standards the C compiler had to conform to, and if it doesn’t it will break a lot of things. Just take that the compiler is one of the most delicate systems humans have made and most people have no idea what it even is or how important it is. It directly affects every program written in the respective language and the compiler needs to be consistent; you can’t release a new compiler that would then break previous code (preferably).
> Seems like all it could store is numbers
How does the Chinese takeaway know what you want, if all you ask for is “a number 26 and a number 15”? Like that.
How come you can ask them for number 15, they can say it costs 15 and you say send it to house 15, and that doesn’t get mixed up? Like that.
How do you know you’re in the different parts – food, price, address? Count those out and number them. Do that for everything.
The processor in your computer has a bunch of “instructions”, things it knows how to do, like “move what is in one memory cell to a register”, or “add what is in these two registers and put the result back into the first register”.
So for instance, to add two numbers a program would do something like:
Move memory cell number 3 to register 0.
Move memory cell number 4 to register 1.
Add register 1 to register 0.
Move register 0 to memory cell number 5.
Programmers who write code at the processor level use a language called assembly, which looks a lot like that example, but it would be something closer to:
“`
MOV 03 R0
MOV 04 R1
ADD R1 R0
MOV R0 05
“`
This is an imaginary version of assembly for an imaginary CPU, but real assembly looks a bit like that. The name of an instruction, followed by the “arguments”, information needed for the instruction to do its job.
This is very close to what is actually sent to the CPU, except the CPU doesn’t understand text, it understands binary.
Let’s say that on our imaginary CPU, the MOV instruction is 0100, and add is 1101. Let’s also imagine that memory cells are binary numbers that start with 0, so 0000 to 0111, and registers and other special memory starts with 1, so 1000 to 1111.
This would be a 4-bit CPU.
Let’s go ahead and translate our program to binary, ready to send to the CPU:
“`
0100 0011 1000
0100 0100 1001
1101 1001 1000
0100 1000 0101
“`
I typed it out as 4bit chunks, with one instruction on each line, but the CPU would receive something more like `010000111000010001001001110110011000010010000101`
The CPU would decode that by chopping the input into 12bit chunks (since each instruction is made of three 4bit codes), and then read the 4bit instruction name and the two 4bit arguments.
Note that real processors use much bigger chunks. Most modern CPUs use 64bit chunks, which is what it means for a processor to be a 64bit CPU or old consoles to be 16bit consoles.
This is also an extremely simplified explanation, just to show how a computer “understands” binary code.
Every 1 or 0 represents part of a circuit being opened or closed.
So maybe, a light switch being turned on could be 1, off could be 0. If you have eight lights, you could use each combination to represent a different character, and you have 256 combinations. You can start making fairly complex messages from that. The most widespread standardized use of eight characters is called ASCII, and it includes combinations for letters, numbers, and other characters. Of course, limiting it to eight bits, there is a maximum of 256 possible characters which can be encoded in ASCII, and you may have to rely on something more complicated for anything beyond the basic Latin alphabet, Arabic numerals, and common non-alphanuneric characters. So, there are other standards that are more complicated, like Unicode. ASCII is my example because it’s widespread, not particularly complex, but complex enough to give you an idea of how things work.
Latest Answers