How do people reverse-engineer compiled applications to get the source code?

323 views

I know the long answer to this question would probably be the equivalent of a college course, but can you summarise how tech people do this?

If you open game.exe with a text editor you’re just going to get what looks like a scrambled mess of characters, so how would one convert this into readable source code?

In: 5

12 Answers

Anonymous 0 Comments

Lots of answers here already, but to extend upon them with what you may have seen in some videos online.

Not every “.exe” is machine code.

Many development platforms use a special kind of in-between layer that does actually make some level of “unscrambling” possible (e.g. Unity, Flash, Java, Android)

Everything people have said above/below is generally right (TheLuminary had a particularly good explanation [here](https://www.reddit.com/r/explainlikeimfive/comments/vbhi3e/eli5_how_do_people_reverseengineer_compiled/ic8a77a/)), with two additional details.

Sometimes it’s not the computer reading these instructions, but instead another program, and sometimes it’s not just the .exe file that has this information, but also some related files (or symbols) which might have an extension like .pdb.

When you see people “reverse engineer” some program and get perfectly readable code, chances are they’re reversing a program written on one of these platforms which either doesn’t compile things down to 1s and 0s, or they have the “symbols” which are basically a big list of variable names and the address at which those names are used.

To extend on /u/TheLuminary’s answer, we can pretty much always get back to something like:

IMM R0, 0x80
LOAD R0, R0
IMM R1, 0x1
STORE R0, R1

Those symbol files however give us the information that says:

> `R0` on line 1 is named `playerId`

Without that, when we come back the other way, we end up with:

int int_0 = 1;

but *with* the symbol data, we can get:

int playerId = 1;

There is also software packages out there, e.g. IDA Pro, that get you to the `int_0` stage, and then allow you to go through and rename things to make things readable again if you don’t have the symbols. Generally they do this by effectively writing their own symbols database that you get to manually populate by hand 🤣

You are viewing 1 out of 12 answers, click here to view all answers.