The executable is compiled – which turns it into instructions for the computer. During compilation the *intent* of every piece of code is lost. This means that for even very simple pieces of code it becomes very difficult to understand what the code wanted to achieve. Machine code can do things like jump anywhere in the instructions and works in terms of memory locations (which may change!) instead of variables, so any sort of high level abstractions that are useful for people like functions, classes, comments and so on are completely lost.
Program intent:
Drive to the shops
Buy milk
High level language:
Drive forward 200m
Turn left
Drive forward 100m
Stop
e.t.c…..
With a few comments, sensibly named functions and variables – it would be super easy to work out what this program was trying to achieve.
Compiled code:
Read memory location 3472
Copy to register 1
Read memory location 3619
Copy to register 2
Add register 1 and 2
Copy result to memory location 4372
*a few thousand more steps*……
It’s a mammoth effort to untangle these basically meaningless steps. Even if you do, this isn’t a one piece of machine code decompiles to a unique piece of code situation – machine code can decompile to many possible pieces of code which functionally may be the same, but are very unlikely to be equally easy to understand.
If you can read several million binary instructions that have absolutely no names, useful text or description of what’s happening.
Source code is written in a pseudo-English. That’s then compiled. Compiled code is “translated” to machine code. Machine code is basically binary, but with all the context and human-readable elements stripped out. You can read it as binary, or as hexadecimal (e.g. FF 06 E2 B4 ….).
Even if you convert those numbers back to assembly (which is *technically* readable, but good luck working out what the hell is going on without any context), it looks like this:
`push rbp`
`mov rbp, rsp`
`sub rsp, 16`
`mov DWORD [rbp-4], 64`
`mov eax, DWORD [rbp-4]`
`bsr eax, eax`
`xor eax, 31`
`mov DWORD [rbp-8], eax`
`mov eax, 32`
`sub eax, DWORD [rbp-8]`
`mov DWORD [rbp-12], eax`
`mov eax, DWORD [rbp-12]`
`mov edx, eax`
`shr edx, 31`
`add eax, edx`
`sar eax, 1`
`mov DWORD [rbp-12], eax`
`mov eax, DWORD [rbp-12]`
`mov edx, DWORD [rbp-4]`
`mov ecx, eax`
`sar edx, cl`
`mov eax, edx`
`mov DWORD [rbp-16], eax`
`mov eax, DWORD [rbp-16]`
`mov esi, eax`
`lea rdi, [rel formatter]`
`mov eax, 0`
`call _printf`
`mov eax, 0`
`leave`
`ret`
Good luck understanding that, even as a programmer.
The compiler literally chooses things that go by the most roundabout routes to do what’s necessary in order to save thousandths of a second, so even if you WROTE the code that results in the above, as someone who has coded in C or assembler, good luck interpreting what it’s actually doing without a lot of work.
Hell, I can barely tell you what human-readable code that I wrote 5 years ago does any more, let alone once it’s been washed through a compiler and all the helpful hints (e.g. variable names, etc.) removed.
BTW: That’s about 2 dozen instructions. A typical Windows program will be in the MILLIONS of instructions and call functions in libraries which are also in the MILLIONS of instructions.
Decompilation projects to try to do this for programs with lost source code usually take absolute genius programmers years, if not decades, to do. It’s actually quicker to “just write Windows again” than it would be to decompile Windows, for example.
Source code is the instructions written in a human readable programming language, that was actually typed to make the file.
Binaries are the files with computer instructions that are readable by the computer, to a person it would just look like a bunch of zeros and ones.
When you download a file, you get the binaries, so you can’t see what’s inside the file. Only the computer can read it.
Programmers make binaries because the computer can run them more efficiently, it doesn’t have to sit there, wasting time reading the human language, trying to figure out what to do, it’s already translated to the computer language for it to run directly.
“Open source” means the programmer give you the human readable version of the file, so you can read for yourself what’s inside, and you can modify it or just verify that it’s the right file when you make your own version of the binary to run.
“Closed source” means the human readable version is not available to you, so you just have to trust the programmer, or the company giving you the file, that the file is the correct version, and it does what the provider says it does.
>Can’t I just read the files to see the code?
* No.
* The file you download is not the source code.
* Source code is like a recipe and the file you download is like the cake.
* Sure you can get a general idea of how the cake is made by looking at it and tasting it…but you won’t find the exact list of instructions of how to make it inside it somewhere.
Latest Answers