eli5 How exactly is the source code of an app/software being “locked”? What is protecting it from being seen?

51 views

eli5 How exactly is the source code of an app/software being “locked”? What is protecting it from being seen?

In: 26

The source code just isn’t distributed to anyone who doesn’t need to see it, and anyone trusted with it usually agrees not to distribute it.

Source code is “compiled”, which turns it from a verbose human readable format to a compact machine readable format.

Reverse Engineering code is taking the machine readable format and turning it back into human readable code – but it is a non trivial task from what I understand.

https://en.m.wikipedia.org/wiki/Compiler

Source code is contrasted with compiled code.

Computers don’t have minds and do not “understand” commands or programs. They are calculating machines that perform various basic logical operations on binary values, and it is up to the programmers to use that in performing higher level logic.

For example, the basic logical operations a CPU can perform are AND, OR, NOT, XOR, and NAND. These kinds of things are physically created “gates” within the structure of the CPU.

An AND gate will take two inputs and output a signal only if both of the two inputs have a signal on them. An OR gate will provide an output signal if either or both of the inputs has a signal. A NOT gate will provide the opposite output as the input signal. XOR will provide an output signal if only one of the inputs has a signal but not if both do. Finally the NAND gate will provide an output signal if there is no input signal and if either of the two inputs has a signal, but not if both inputs have a signal.

As you can see these kinds of basic functions are difficult to work with in creating a complex program without many layers of abstraction. Programmers tend to work using programming languages that summarize and convert combinations of these basic binary operations into a format more easily understood and written by a human. For a computer to do anything with this it needs to be “compiled”, a process that takes this readable set of instructions and converts it into a form the CPU can process.

The readable set of instructions is the source code while the compiled code is what is actually distributed to end users. Programs exist to try to de-compile programs but context, comments (notes to explain what is being done and why), variable names, formatting, and other things helpful for understanding what a program is doing are not available. For a program that might have 80 million lines of code, not having the source code makes figuring out how things are working behind the scenes quite difficult.

When you get a piece of software, you’re getting a cake but no recipe.

There is software where the metaphorical recipe is available for anyone who’s curious, which is called “open source” software. Most anything you can walk into a store and buy off the shelf, though, isn’t.

The way it works in general is the programming source code is written in a text file with a specific extension (.py, .cpp, etc). Then a program called a compiler takes that file, translates what is written in 0s and 1s and creates a new file with a .exe extension. That is the executable file which your computer runs and which is distributed to users. So the original file with the source code is not distributed, only the executable, since this is the file which is actually the program that your computer can run.

When the source code in the original file is changed, it needs to be compiled again into a .exe file and distributed again to users. That is what updates/patches are.

The original english-like text is condensed and simplified to simple instructions a machine can read.

Human language structures, which support the easy readability of the logic inside the program gets lost in this process.

Therefore, the process can’t be reversed losslessly and you need the original code to have the same information available as the programmer does.

To answer your question, OP – nothing. It is generally possible to take an application you have on your computer and decompile it to see the source code.

The big catch (that I feel like some folks in this thread have touched upon) is that modern software is EXTREMELY complex. It is built using a programming language that was built upon another language that was built upon another one…10 more levels down.

As a result, in software development, an enormous amount of time and effort is spent on making code readable (by another human). There are cases where developers will even settle for what they know is slightly less effective code, because it’s more readable and easier for other developers to understand.

You lose all of this readability when you compile the code, because the compiler doesn’t care about any of that readability. So while you can technically decompile it, the code you end up with will usually not be of much use to anyone.

It depends on exactly what you’re talking about, for a web app the source code, at least for the front end, is exposed in some degree to the user if they knew where to look. It would be nearly impossible for them to read however due to some obfuscation techniques that get used. The two primary means of obfuscation we use at my place of work is minificafion and uglificafion. Minification is pretty straight forward to understand, a compiler will remove all unnecessary white space, line breaks, comments, unnecessary brackets and semi colons, and even go through and replace variable and function names with one or two letters. This has the advantage of making the file sizes significantly smaller as well, but it becomes virtually impossible for a human to read it. At that point, it can go through a process called uglification, which does what you think it does, it makes the code ugly and even harder to read. By the time this is done, it would be exceedingly difficult, albeit not impossible to reverse engineer the code.

You also have to consider that even pretty and verbose code isn’t always easy to understand for someone who just sits down for the first time and tries to read someone else’s code. Generally speaking they’ll either need to spend a lot of time digesting what everything is doing, or have people familiar with the code base explain things. Totally jumbling up the code with extreme short hand, no punctuation, and performance optimizations will ruin even the best developers time.

Source code is like the blueprints of a building. An app is like a finished building. Sure, you can look at a building and even tear down some walls to look at the plumbing, structure, etc. and make some guesses as to what the blueprints look like but you won’t know exactly what they are or more importantly why they are the way they are. Same applies to an app, you can look at its internals to see what it’s doing but it’s really hard to guess what the source code looks like from that.

TL;DR: software doesn’t usually contain its source code. Source code is just instructions for a computer to build an app.

Note: for some apps, especially those written in JavaScript, you CAN see the source code.

Others on this thread have covered the main point of compilation, which is converting your human-readable source code to a binary executable or byte code, so I won’t repeat.

However, there is also a process called ‘obfuscation’ which can be used too.

In most cases, it is absolutely possible to decompile compiled code and return it to source code. In most cases, you’ll never fully decompile back to the *exact* original source code as most compilers will do things like disregard comments, and also apply a number of optimisations which can essentially reorder and reorganise the code in its compiled form to make it more efficient.

However, good decompilers will still produce pretty readable code.

So, this is where obfuscation comes in. Obfuscation tools will take your original source code and essentially make it unreadable (by humans) – or at least *incredibly difficult to read* – by changing labels, variable names, constants, function names, class names, etc to seemingly random strings, rearranging the organisation of the code, removing unnecessary white space and empty lines and various other techniques.

However, importantly, obfuscation does not affect the execution of the code. As far as the computer is concerned, the code works *exactly* as originally intended. It just makes decompiling to an (easily) readable form much more difficult.

This is more of an example of ‘security through obscurity than a case of ‘locking’ it though.

It’s sort of like the difference between a labelled blueprint/instructions on how to build a circuit board, and just looking at a circuit board itself.

Looking at the blueprint you’d see a lot of words and phrases clearly describing how things work and what does what, but once that’s turned into the final board it’s just a bunch of unlabeled chips and abbreviations.

Source code is like the blueprint. It’s labelled with human readable words and designed to be easy to read, but in order for that computer to actually run it it needs to be converted to 1s and 0s that the computer can read. All words are replaced with numbers, the entire structure gets re-arranged, and it becomes really difficult to work backwards and figure out what every thing is supposed to do, sort of like that one Pixar animation with the alien learning to fly the ship https://youtu.be/LVLoc6FrLi0

Not to say it doesn’t happen. Dedicated fans reverse engineer games from the compiled binaries pretty frequently. One I’m currently following is called “Metaforce,” which is a reverse engineering of Metroid Prime and it’s super interesting seeing how it’s done on the live streams.

Let me try a comparison.

The runnable application is like a PDF that you can print. The source code would be the Excel sheet from which the PDF is produced.

If you want to make serious changes to the PDF, you need the initial spreadsheet, and you cannot realistically infer it from the PDF.