How did Konami lose the source code for the original Silent Hill game? Why couldn’t they just datamine the source code from the retail copies of the game?

1.05K views

I’ve heard many times that the reason the Silent Hill remaster collection didn’t turn out so well was because Konami lost the original source code and had to re-create it. But I don’t understand how that is possible. If they were selling copies of Silent Hill, why couldn’t they just take a single disk of it and datamine the source code off of it? How could they possess the game without possessing the game’s source code?

In: 1563

27 Answers

Anonymous 0 Comments

Once source code goes through a compiler, it doesn’t look anything like what you originally wrote. You can’t reverse engineer the original code.

Anonymous 0 Comments

Source code is different than the compiled code.

Think of it like a recipe for a cake, and a baked cake.

Having a cake, or even eating it and being able to tell “flour, eggs, butter, vanilla, sugar” doesn’t necessarily tell you how to make the cake. You can get some info from this, but generally not an entire full recipe and steps from it.

Source code is the exact text of the programing language that is human readable, “if hp = 0, then player = dead” (not a real language, but an easy example).

It is then fed through a compiler that translates it to “machine speak”.
Like mixing it then baking it in an oven so it can be eaten.

Compiled/object code is what the computer makes sense of.
You can reverse engineer some things from compiled if you’re really dedicated and skilled, but even then it’s not an exact match and things can get messed up. You lose variables, notes, etc which would then need to be un-borked.
In the previous example, it would end up like “if [unknown variable] = 0 then…”
But games have a lot of variables. Player locations, loot values, damage, HP, ranges, light values, player speed, etc.
So now unless that’s fixed the game is broken. Which *can* be fixed, but it’s often easier to recreate the game.

As a side note, it’s why source code leaks are a big deal for programs. For games as an example, it would describe how the game works, including things like anti-cheat.

Anonymous 0 Comments

Because the source code is not in the retail copy. The retail copies contain machine code that a compiler has produced from the source code. There are decompilers that can create source code from it again but it will not be identical and useful info like variables and other names that make is simpler to read for a human is likely gone. Understanding the decompiled code is a lot harder.

It is a like the source code is a food recipe, and the machine code is the prepared dish you sell to a customer. Someone can figure out how to make that dish from just the food but it will be a lot harder and likely require experimentation. The skill you need is not the same as what you need to cook a dish.

Anonymous 0 Comments

That’s not how source code works.

Computers (including video game consoles) only understand instructions in the form of machine code. Machine code is really hard for humans to write and understand, so we invented higher-level programming languages. These languages are easy for humans to read and write, and are then translated by a program called a “compiler” into machine code which the computer can run, called an “executable” or “binary.”

The source code is the original, human-readable code which might be written in a language like C or Python or Java or something. But it’s the executable that gets copied onto game disks and sold on the market. It’s usually virtually impossible to decompile an executable into the original source code.

Anonymous 0 Comments

The source code of a program is the code written in a programming language, which is something that humans can read and write easily. That’s why it’s important to have the source code; if you need to make changes or replicate existing functionality, it’s relatively simple for a programmer to look at the source code and see what it does (assuming it’s well-written) and then do whatever they need to with it.

However, computer processors do not “speak” programming languages. They are controlled by very specific inputs that are hard for humans to read or write. There are various ways to get from code in a programming language to something the processor can follow, but the simplest form is a program called a compiler. A compiler reads code and, based on the rules of the programming language and the kind of processor it’s writing instructions for, spits out a sequence of instructions for the processor that do what the code says should happen. These instructions (along with visual and audio resources, etc) is what actually gets put on a game disc for a console to read and execute.

There’s no way to go from a compiled program back to the source code. The compiler will generally use all sorts of tricks to generate efficient instructions rather than just blindly translate things line by line, so there’s no real way to figure out what the programmer originally wrote based on what the compiler turned it into. Furthermore, well-written source code will include things like comments, formatting, and naming conventions that make the code much easier to read to humans, but which are ignored entirely by the compiler. There are so-called “decompilers” which can look at a compiled program and generate some code in a programming language that does the same thing, but it can’t recreate the intentions of the original programmer, and the code it generates will usually still be difficult to read or modify.

For instance, the source code might have a function called “shoot” with parameters called “source”, “direction” and “damage”. The function might have a comment next to it saying “This function traces a ray from the source in the target direction until it collides with the environment or an entity. If an entity is hit, it will take the specified amount of damage.” If this is compiled and then passed through a decompiler, it might turn into a function called something like “func_01283” with parameters “arg_1”, “arg_2”, and “arg_3”, with no more explanation than that. The actual code of the function will probably be strangely structured by human standards as well. You can see why having the source code would make things a lot easier for a team trying to recreate that function in a game remake.

Anonymous 0 Comments

The source code never ends up on the disk – the compiled machine code does.

The source code is the recipe for the cake. The code on the disk is cake that the recipe produces.

You can imagine how difficult it might be to guess the exact recipe for a cake, just by looking at the finished product. A cake recipe is maybe a page long, two if the author of it is feeling pretentious. The source code for a game though, is tens of thousands of pages long

It is chock full of comments and meaningful structure and variable names that all clearly show the purpose of every single calculation, and yet the processor has exactly zero use for any of this information, so it is all summarily discarded during the compilation process.

Moreover, the workings of the game are a trade secret anyway, so all of it actively prevented from leaking from the company offices. It then almost inevitable becomes lost, because not all companies have the foresight to preserve a copy for the sake of doing a re-release in 20 years time, if they even care about the legacy of the product at all.

Without the original recipe, you cannot hope to bake exactly the same cake. You can only attempt, by trial and error, to bake a 100 different cakes, until you slowly begin to approximate the result of the original recipe, by comparing your new cake to the one you already have.

Anonymous 0 Comments

None of the answers address “how did Konami lose the source code?” I still want to know the answer to this specific question

Anonymous 0 Comments

If you’re interested in seeing what a good decompiler can do, install the free NSA program called Ghidra and run it over a simple program. In some cases it’s actually pretty readable, but it gets messier the bigger the program gets.

Anonymous 0 Comments

Source code:

int main() {

create_world();

create_player();

create_enemy();

game_loop();

return 0; }

Compiled code (code found in the discs on shelves):

1Dj4KD&*^3la”‘s;2(A)*@#&AD*)LDK

Disassembled (data mined) code (can’t write actual assembly so I’ll just write roughly what it does):

Move object to register one

Jump to line 23

Copy register

And so on.

Now, there’s a lot you can learn from reading disassembled code, and you can find tons of useful information about how a program works, particularly in terms of some of the simpler game logic (oh, hey, this point of the game calls this API) (oh hey, the the game compares these two values here) (oh hey, you could probably exploit this because the integer will overflow…), but it’s of limited value in re-creating the original source code, and you’re basically stuck coding the entire thing from scratch because there’s not that much relation between the disassembled code and the original source code, because often times the code as you see it in the source code will be pretty abstracted from the what the code actually does, in the same way a king might not know a ton about what tunnels a miner goes through to find coal, but he can delegate that the miner does mine coal through his chain of command, and the king might understand the purpose of the mine, and its importance to the political situation of the kingdom.

If you got rid of that chain of command, but somehow had a list of every step a miner took (I walked two steps forward, turned thirty degrees, took five steps forward, stepped to the left, and so on) the King wouldn’t necessarily immediately know how to manage individual miners.

Code is much the same way.

Anonymous 0 Comments

Because there’s nothing to “datamine”, as the source code is not present in the released game.

This happens because the source code was transformed in a process referred to as “compiling”. Seemingly unrelated to what its name suggests, this is a process where the source code is transformed into what’s called *a* binary, made up of machine code (bytes that your computer’s CPU can readily execute) (*).

To get back the source code, this transformation would need to be reversible, which can only happen if it maps things 1:1, but this isn’t like that. It’s like addition: 2+4 is 6, but so is 3+3. So no source code for Konami (or anyone else). Not fully automatically anyways.

You can extract an approximate of the source code with software called decompilers, but it has to make a lot of guesses, so the result is still quite far from the actual source code usually, and they might never match completely, only functionally. People need to do what’s called reverse-engineering to slowly beat it back into shape, but that’s still only going to be just a functionally equivalent source code, not necessarily a perfect copy of the original.

And as you may know, time is money. Plus reverse engineering is a rare skill, so that’s also money. Money that apparently Konami didn’t want to spend.

(*) Some would argue that source code -> intermediate code is also compiling, but this is not relevant much.