How do people reverse-engineer compiled applications to get the source code?

312 views

I know the long answer to this question would probably be the equivalent of a college course, but can you summarise how tech people do this?

If you open game.exe with a text editor you’re just going to get what looks like a scrambled mess of characters, so how would one convert this into readable source code?

In: 5

12 Answers

Anonymous 0 Comments

Lots of answers here already, but to extend upon them with what you may have seen in some videos online.

Not every “.exe” is machine code.

Many development platforms use a special kind of in-between layer that does actually make some level of “unscrambling” possible (e.g. Unity, Flash, Java, Android)

Everything people have said above/below is generally right (TheLuminary had a particularly good explanation [here](https://www.reddit.com/r/explainlikeimfive/comments/vbhi3e/eli5_how_do_people_reverseengineer_compiled/ic8a77a/)), with two additional details.

Sometimes it’s not the computer reading these instructions, but instead another program, and sometimes it’s not just the .exe file that has this information, but also some related files (or symbols) which might have an extension like .pdb.

When you see people “reverse engineer” some program and get perfectly readable code, chances are they’re reversing a program written on one of these platforms which either doesn’t compile things down to 1s and 0s, or they have the “symbols” which are basically a big list of variable names and the address at which those names are used.

To extend on /u/TheLuminary’s answer, we can pretty much always get back to something like:

IMM R0, 0x80
LOAD R0, R0
IMM R1, 0x1
STORE R0, R1

Those symbol files however give us the information that says:

> `R0` on line 1 is named `playerId`

Without that, when we come back the other way, we end up with:

int int_0 = 1;

but *with* the symbol data, we can get:

int playerId = 1;

There is also software packages out there, e.g. IDA Pro, that get you to the `int_0` stage, and then allow you to go through and rename things to make things readable again if you don’t have the symbols. Generally they do this by effectively writing their own symbols database that you get to manually populate by hand 🤣

Anonymous 0 Comments

[This is a really really long Twitter thread](https://mobile.twitter.com/Foone/status/1536053690368348160) – that’s just how foone likes to write.

This is a step-by-step live walkthrough of decompiling and reverse-engineering the old game SkiFree. They start by using a few programs to take the exe and turn it back into code – completely unreadable code, but still code. The decompiler can take the machine code from the exe and generate C code that matches up with it, but none of the functions or variables are going to have useful names. The decompiler then gives you tools to start organizing and renaming the decompiled code until it’s clear.
[Here](https://twitter.com/Foone/status/1536061110662533125?s=20&t=l99fGoot0_XjoXn0Xn9f9A) they find a function that the decompiler just called “FUN_00404950”. They look at it and see that it takes a piece of text as an input, does some checks on it, then tells Windows to display a message box with that text. So they change the name of this function to “DisplayMessageBox”.

Now, they start looking around for parts of the program that call DisplayMessageBox – can we figure out what _those_ are doing? Frequently, you look at the way things are formatted or the exact bits of text used – [here](https://twitter.com/Foone/status/1536067991518912512?s=20&t=l99fGoot0_XjoXn0Xn9f9A) is some code that makes a bit of text that looks like “number:number:number.number”. If you look at the game, you notice that the player’s time is written that way: “hours:minutes:seconds.fraction”. So this function that generates that text is probably displaying the player time.

Two functions down, a few hundred to go…