How do people reverse-engineer compiled applications to get the source code?

315 views

I know the long answer to this question would probably be the equivalent of a college course, but can you summarise how tech people do this?

If you open game.exe with a text editor you’re just going to get what looks like a scrambled mess of characters, so how would one convert this into readable source code?

In: 5

12 Answers

Anonymous 0 Comments

That mess of characters has meaning, as evidenced by the computer knowing how to run it. Converting it back is just a matter of converting the coded data back to a human-readable form. This is no different than how you can convert sounds to letters and understand it either way. The jumbled mess of characters is no different that how this block of text is a jumbled mess of characters to somebody that cannot read English, or Japanese text is a jumbled mess for somebody that cannot read Japanese.

The computer prefers binary instructions like “0x90” (1 byte when stored in a code), while a human will have an easier time reading it as “NOP”, which means “no operation” or “do nothing”, typically implemented as a meaningless operation like X+0. All of the other operations a computer can perform are similarly encoded, such as “0x05 0x00 0x10” is “ADD AX 0x0010” or “x = x + 16” (I may have the order wrong and that may be “x = x + 4096” instead). For the vast majority of processors, you can look up the encodings online as they are publicly provided by the chip maker – the examples I gave are for x86/x64 (desktop Intel/AMD processors).

That very basic form is known as disassembly, and produces readable code that is very verbose.

From there, you can process the instructions for common patterns to make even more readable code. The process is not perfect, however. The computer has no need to know the names of variables or functions, and so those are almost never saved in the binary. To get those, you either have to figure them out from what the code does or when it is called, or you need separate debugging data. Optimizations will also make a mess of the code by moving and changing it, sometimes very significantly, which makes the whole process much harder.

As a final note, some coding will not be fully compiled. Languages like Javascript (heavily used on webpages) is plain text that is only compiled right before it is run. Python (commonly used for desktop and scientific scripting) can be compiled ahead of time, but keeps a lot of debugging information with the code for ease – often it is distributed as plain text as well.

You are viewing 1 out of 12 answers, click here to view all answers.