Binary files are directly executable by the processor itself – they have a short header that tells the operating system where to load them into memory and where to start but are otherwise ready to run.
Here’s an example pulled from Microsoft Edge’s .text section (encoded as hexadecimal to aid readability):
41 57 41 56 41 55 41 54 56 57 55 53 48 81 EC 08
01 00 00 48 8B 05 0E 10 2F 00 48 31 E0 48 89 84
24 00 01 00 00 48 8B 02 48 85 C0
And here’s how it breaks down into instructions:
00007FF600271000 | 41:57 | push r15
00007FF600271002 | 41:56 | push r14
00007FF600271004 | 41:55 | push r13
00007FF600271006 | 41:54 | push r12
00007FF600271008 | 56 | push rsi
00007FF600271009 | 57 | push rdi
00007FF60027100A | 55 | push rbp
00007FF60027100B | 53 | push rbx
00007FF60027100C | 48:81EC 08010000 | sub rsp,108
00007FF600271013 | 48:8B05 0E102F00 | mov rax,qword …
00007FF60027101A | 48:31E0 | xor rax,rsp
00007FF60027101D | 48:898424 00010000 | mov qword ptr…
00007FF600271025 | 48:8B02 | mov rax,qword…
00007FF600271028 | 48:85C0 | test rax,rax
The first column is the address in memory – typically used to identify where the program is during execution. The second column is the byte sequence – each one identifies what action the processor should take for that instruction and any inputs it should take them on. The last column is the human readable name of the instruction and arguments.
In very early computers, you would start by writing down a set of instructions, then looking up each byte sequence and careful “filling in the bubbles” similar to a scantron used at school (or in the very earliest computers by connecting wires on a plug board).
Once hard drives became commonplace, instead the byte codes could be stored directly on disk – usually by building on already working computers to get started.
Next someone wrote a program that looks up the byte codes automatically – this is called as assembler. (Not sure if assemblers were written in the punch card era or not.)
Gradually assemblers started to include more features to aid in reducing errors and increasing productivity – shorthand for common sequences of operations. This is where programming languages start to become distinct from the set of operations provided by the hardware directly – there is no longer a 1:1 translation from raw bytes to commands. Similarly, the program is no longer called as assembler – but the more generic term compiler.
(There are other parts to a compiler – notably linkers, preprocessors, etc – but this is enough to give a good idea of where things started)
Latest Answers