eli5 How do game developers lose the source code? Can’t they get the code from the discs like modders and speedrunners do?

349 views

I just found out that Konami somehow lost the source code to the PS2 version of Silent Hill so they had to recreate it. How do they lose it and why can’t they get it from a random game disc? Isn’t that how modders, emulators, and speedrunners see the code?

In: 0

7 Answers

Anonymous 0 Comments

Couple parts to unpack with this:

> source code

First let’s clarify what source code is. Source code is instructions written in a way that *humans* can understand. For example, this is a bit of C++ source code (a very common language):

#include <iostream>
using namespace std;

// The main function: takes the number in, and prints it back out
int main() {
int number;

// Get the number from the user
cout << “Enter an integer: “;
cin >> number;

// Print the number
cout << “You entered ” << number;
return 0;
}

There’s a couple things to note about this code:

The lines with “//” are *comments* that are ignored by the computer, they’re just there to help the person reading the source code to understand what it’s doing.

The “int number” is what’s called a variable. It stores some value (in this case, an integer number) with a human-readable name so that it can be used later in the code, and most importantly, so that a human reading the code can know what it’s for.

Now, a computer can’t run this source code directly. It’s designed for humans, not for the computer itself. To actually run the code, it needs to be *compiled* into what is called *machine code* or binary code. This is, in the simplest sense, the raw 1s and 0s that the computer can understand.

However, this is generally a one-way process. Once you have machine code, you can’t get the source code back. Some things are ignored (like the comments), some things are eliminated (variable names), and some things can be optimized away or altered in some way that makes them harder for a human to understand but faster for the computer.

There are such things as decompilers, that can take machine code and give you back something that looks like the original source code in a general sense. But it’s going to be nearly impossible to understand, making it functionally worthless on the scope of a whole project.

Thus, it’s often easier to just get a team of new programmers, give them the details of how the game should work (e.g. let them play through it a few dozen times), and then have them write brand new code that “does the same thing”.

> How do they lose it

Simply put, it got deleted, somehow, by someone, for some reason, and there is no backups. The reasons could be many: it only existed on a handful of computers that got thrown out, the one golden backup copy is on a tape archive no one can find, etc. Just like a random spreadsheet could be lost, so too can source code. There are ways to mitigate this, like distributed source code revision/version control (software like Git, SVN, etc.), but these are more recent inventions, and in the era of the PS2 it’s likely that developers were working in what we today would consider a very suboptimal way. Not to mention that this is all proprietary, so it’s restricted to just the few people in the company who need to see it all, leaving even fewer copies.

> why can’t they get it from a random game disc

This is because what’s on the game disk you buy in the store is the *machine code* mentioned earlier. There is no source code on there, just the compiled machine code. After all, if the company shipped the source code on the game disk, anyone could take it, modify it, compile their own version without copy protection or whatever other features, and then release it themselves.

>Isn’t that how modders, emulators, and speedrunners see the code?

This is the more interesting thing about it all. For the most part, *none* of these communities do what they do by looking at the original source code. Sometimes, they might use a decompiler to check bits here or there, but for the most part all of these alterations are done by either exploiting bugs, and chaining those exploits together, or by trial and error. For instance, you might notice that if you wiggle a cable a bit, a number in the HUD jumps from 3 to 4. You can then do trial-and-error to find that this happens when wires 3 and 4 in your controller are shorting slightly, then you could perhaps exploit that to add more health or XP to your character. This is a very simplified, trivial example, but the idea is the same – people spending a lot of time trying to find little bugs and exploits, and then sharing them, slowly leads to more developments.

Now of course, this depends on the game, the platform, etc. Some games might have robust, official modding support. For instance Bethesda releases something called the Creation Kit with their Elder Scrolls games, which is basically a tool that they use internally to make the maps and levels in the game. This lets anyone out there make their own mods and share them, all without ever touching a piece of the original game source code.

Emulators are also a different can of worms. For the most part, what an emulator is doing is taking the machine code of the game, and trying to “fake” the hardware as a transitional layer to give you out something that can be shown on your computer or another device. So for example, the game might make a call to the sound subsystem of the PS2 to make a “bleep” noise. On the PS2 itself, there’s a chip who’s purpose is to take that command from the CPU when it executes, make a bleep waveform, and send that out the audio cable to your speakers. On an emulator – and again, often by trial and error or by closely studying the original physical device and looking at schematics, etc. – instead of there being a real chip to take that command, the emulator program presents a fake chip, which then translates that command to make a “bleep” into a command for your actual computer to make a “bleep”. So basically, an emulator is a translation layer between the game machine code and a different computer system. But to do this is doesn’t really need to care too much about what the game source code is doing: it just needs to know what the original hardware did when it got command X or Y and replicate it.

But of course, none of this really helps with the “we don’t have the source code” problem, so it’s back to either decompiling and cleaning up the machine code (very hard and tedious), or writing new code to do the same thing (easier), which is what the publisher in your example did.

You are viewing 1 out of 7 answers, click here to view all answers.