The issue is that random disc could have been edited. So there is no way to verify the integrity of the files.
If there is only one backup and say access is lost there goes the original copy to check any random disc against.
The software I support our dev team have a master sheet with all the changes and they have backup copies of the source code in multiple locations.
>sn’t that how modders, emulators, and speedrunners see the code?
Not unless a game has really good mod support. That is to say the devs put systems in place so the modders can see the source code. All thats on the disc is compiled code, which is just instructions for the computer. Things like Move data from 0x000014 to 0x0000216. Not really readable. If you are clever and a ton of time you might be able to work out whats happening at any given point through looking at memory addresses, but thats far from source code and takes a long time.
Losing source code is actually pretty easy because once the game actually ships its not really needed. You compile it into a form that the computer can understand and ship that. If you need more discs you just copy the compiled code.
The original source code was written in a language like C, which has english-like words like “if” and “while”. Variables have names, functions have names, and comments are contained within the source code written by the developers for their own benefit.
The original game disks contain machine code. CPUs run instructions that are a sequence of really fundamental operations like “load from RAM”, “add value”, “go to” and so on. Variables have numerical addresses, functions have numerical addresses, the comments are gone, and the conversion process (run by a program called a compiler) has free license to tweak the code how it sees fit for the benefit of the quirks and styles of the PS2’s CPU.
While decompilers do exist, all the nuance and substance of the original game source code is long gone, leaving you to figure out what this variable means and what this loop actually does, multiplied by a few thousand of them, all over the game code. It’s not a fun time.
the game is like a house, and the source code is like the bricks that make up the house. once the house is constructed, that’s it, you can’t undo it. you can’t remove the bricks one by one and stack them up.
after compilation, the code on the disc is now just a bunch of 1s and 0s. modders can do what they do by figuring out which of the 1s and 0s do which specific task. like how you can add window or a wall socket on an already built house.
Couple parts to unpack with this:
> source code
First let’s clarify what source code is. Source code is instructions written in a way that *humans* can understand. For example, this is a bit of C++ source code (a very common language):
#include <iostream>
using namespace std;
// The main function: takes the number in, and prints it back out
int main() {
int number;
// Get the number from the user
cout << “Enter an integer: “;
cin >> number;
// Print the number
cout << “You entered ” << number;
return 0;
}
There’s a couple things to note about this code:
The lines with “//” are *comments* that are ignored by the computer, they’re just there to help the person reading the source code to understand what it’s doing.
The “int number” is what’s called a variable. It stores some value (in this case, an integer number) with a human-readable name so that it can be used later in the code, and most importantly, so that a human reading the code can know what it’s for.
Now, a computer can’t run this source code directly. It’s designed for humans, not for the computer itself. To actually run the code, it needs to be *compiled* into what is called *machine code* or binary code. This is, in the simplest sense, the raw 1s and 0s that the computer can understand.
However, this is generally a one-way process. Once you have machine code, you can’t get the source code back. Some things are ignored (like the comments), some things are eliminated (variable names), and some things can be optimized away or altered in some way that makes them harder for a human to understand but faster for the computer.
There are such things as decompilers, that can take machine code and give you back something that looks like the original source code in a general sense. But it’s going to be nearly impossible to understand, making it functionally worthless on the scope of a whole project.
Thus, it’s often easier to just get a team of new programmers, give them the details of how the game should work (e.g. let them play through it a few dozen times), and then have them write brand new code that “does the same thing”.
> How do they lose it
Simply put, it got deleted, somehow, by someone, for some reason, and there is no backups. The reasons could be many: it only existed on a handful of computers that got thrown out, the one golden backup copy is on a tape archive no one can find, etc. Just like a random spreadsheet could be lost, so too can source code. There are ways to mitigate this, like distributed source code revision/version control (software like Git, SVN, etc.), but these are more recent inventions, and in the era of the PS2 it’s likely that developers were working in what we today would consider a very suboptimal way. Not to mention that this is all proprietary, so it’s restricted to just the few people in the company who need to see it all, leaving even fewer copies.
> why can’t they get it from a random game disc
This is because what’s on the game disk you buy in the store is the *machine code* mentioned earlier. There is no source code on there, just the compiled machine code. After all, if the company shipped the source code on the game disk, anyone could take it, modify it, compile their own version without copy protection or whatever other features, and then release it themselves.
>Isn’t that how modders, emulators, and speedrunners see the code?
This is the more interesting thing about it all. For the most part, *none* of these communities do what they do by looking at the original source code. Sometimes, they might use a decompiler to check bits here or there, but for the most part all of these alterations are done by either exploiting bugs, and chaining those exploits together, or by trial and error. For instance, you might notice that if you wiggle a cable a bit, a number in the HUD jumps from 3 to 4. You can then do trial-and-error to find that this happens when wires 3 and 4 in your controller are shorting slightly, then you could perhaps exploit that to add more health or XP to your character. This is a very simplified, trivial example, but the idea is the same – people spending a lot of time trying to find little bugs and exploits, and then sharing them, slowly leads to more developments.
Now of course, this depends on the game, the platform, etc. Some games might have robust, official modding support. For instance Bethesda releases something called the Creation Kit with their Elder Scrolls games, which is basically a tool that they use internally to make the maps and levels in the game. This lets anyone out there make their own mods and share them, all without ever touching a piece of the original game source code.
Emulators are also a different can of worms. For the most part, what an emulator is doing is taking the machine code of the game, and trying to “fake” the hardware as a transitional layer to give you out something that can be shown on your computer or another device. So for example, the game might make a call to the sound subsystem of the PS2 to make a “bleep” noise. On the PS2 itself, there’s a chip who’s purpose is to take that command from the CPU when it executes, make a bleep waveform, and send that out the audio cable to your speakers. On an emulator – and again, often by trial and error or by closely studying the original physical device and looking at schematics, etc. – instead of there being a real chip to take that command, the emulator program presents a fake chip, which then translates that command to make a “bleep” into a command for your actual computer to make a “bleep”. So basically, an emulator is a translation layer between the game machine code and a different computer system. But to do this is doesn’t really need to care too much about what the game source code is doing: it just needs to know what the original hardware did when it got command X or Y and replicate it.
But of course, none of this really helps with the “we don’t have the source code” problem, so it’s back to either decompiling and cleaning up the machine code (very hard and tedious), or writing new code to do the same thing (easier), which is what the publisher in your example did.
The source code is a human-readable code that contains a lot of useful information that explain how the game is supposed to work. It makes it much easier to change the game somehow, including porting the game to a different system, if you have it. This is what is lost.
The game on the disk is a code written for machine to read. It loses a lot of useful information, and contain just enough information for the machine to run. The problem is further exacerbated if obfuscation is used: method that make the code still understandable to the machine but very hard for human. Many game companies use such method to combat pirates and cheaters.
For example, programmers can write something like “x is a number that store the x-coordinate for the player character” and this will be completely missing in the machine code. The machine don’t need to know what x does, it only need to know what to do to x. It won’t even know the name “x”.
Emulators don’t need the source code. They effectively allow one machine to pretend to be another machine. All they have to do is to read the machine code and do what the machine code said to do.
Modders and speedrunners often don’t need to see the code either. Speedrunners might only need a very small portion of the code, and modders, depending on games, might not need to know anything at all. It’s a lot easier to just look at a small part of the machine code and figure out what that part does, than to examine the entire code at once.
Latest Answers