Why is it so difficult to copy source code that is not “open source”?

1.35K views

It’s been in my mind if we are using the software/program or even hardware of a tech company, we can play around, install-unsinstall and more. Then how is it so difficult for someone to “unhide” the source code that the device uses? Technically the code is in the device somewhere hidden in it, so it’s there, but still, it’s almost impossible to obtain the source code. How do they achieve this so no one copies their code?

In: 366

42 Answers

Anonymous 0 Comments

when you code a program, lets say in C++, after you are done, to make it actually into a program you need to compile it.

compiling turns human-readable code into machine code, which literally tells the CPU which bits in the ram to set to 1 and which to 0, its very hard for humans to read, and takes *a lot* of work for someone to figure out the c++ code again without knowing the source code. (this is called reverse-engineering),

but even when you figure out the c++ code from machine code, it wont even be exactly the same as the source code, as all comments will be lost (since when compiling they are removed as they would only take up space), and it basically just be very hard for anyone to understand.

when you compile a program from source code to machine code, its not meant to be reversed or read by a human, it only meant to be an .exe file that your computer can read.

“Open Source” means that a person simply decided to openly share the source code to their program for others to check how it works and potentially use parts of it in their own code (if the owner allows it)

also note, that reverse-engineering “Closed Source” might be illegal, and if a developer choosed a closed-source model, they almost certainly wont allow someone to use their code in other programs anyway.

Anonymous 0 Comments

Yes you’re correct the code is on your machine running somewhere.

While code is usually write in a human readable language such as C, Python or Java when it comes time to ship that application out to customers you need to make sure it is in a format that the customer knows what to do with it. Hand your average person a Python file good odds they won’t know what it is, how to run it, or hell they probably won’t even have python in their machine.

This is where a standard file that can run the app comes in such as exe. This file will have a full set of instructions so the computer knows how to run the file regardless of what it was written in. It may include the prerequisites to run the file so anyone can run it on any machine. The exe is created so a machine can quickly understand it and run it. It is not designed to be human readable. It is then hard to translate this abck to human readable code.

Another angle is that companies may want to protect their code. They may encrypt their code and intentionally obfuscate it. This means even if you did find the code you wouldn’t be able to make sense of it.

Anonymous 0 Comments

> It’s been in my mind if we are using the software/program or even hardware of a tech company, we can play around, install-unsinstall and more. Then how is it so difficult for someone to “unhide” the source code that the device uses?

It’s not necessarily “difficult” – decompilers exist that can reconstruct some kind of source code from an executable program or library.

Decompliers often don’t produce *great* source code though – some languages have better reverse engineering tools than others but in virtually all cases comments in the code are going to be lost, and often variable names will be lost as they’re optimized out of the compiled code. A skilled programmer can still read the results and tell you what the program is doing, even copy and modify it themselves, but at a certain point you’re better off black-boxing it (have your programmer use the program, trying various inputs and noting the outputs/behavior, and write one that does the same thing) or doing a clean-room implementation (take a programmer who has never use the software, explain what you want the program to do, and let them write it from scratch) versus trying to reverse engineer a large, complex system.
(For example it would almost certainly be easier to write a new word processor than to reverse engineer Microsoft Word.)

Some (badly-compiled) programs may even *include* their source code – I’ve seen more than a few closed-source Java programs that had their source files buried inside the .jar file – all you need to do is open the archive and look around.
In cases like that you run up against the *real* deterrent – which is legal, not technical: I can *always* copy the program outright, because it’s a digital file. Software piracy is still a thing that happens.
I can even hack the executable to bypass license checks and the like – that’s often much easier to do than decompiling or reverse-engineering the code since you just have to run the code in a debugger, find the function it’s calling (which the debugger will help you do), and skip the actual *checking* part (which the debugger will *also* often help you do).

If I do those things – especially if I then distribute my modified copy – the folks who wrote the software and are selling it for money will sue me if they catch me, and they probably have a much larger supply of both money and lawyers than I do.

Anonymous 0 Comments

What we use as “code” is different from what the computer uses.

A programmer might “speak languages” like C++ or Python, but the computer only speaks in instructions, which are exact commands on what operation to do, like store a number on this address or add the number from this address to another address.

The languages like C++ allow us to write code in a much more human comprehensible format, but they then need to be translated, so that the computer understands them. This is a one way translation however. You might declare a variable and name it, so that you always know what it represents in code, like

“`
float playerHealth = 100f;
“`

When you translate the code for the computer, or compile the code, the name for the variable gets lost, because it serves no purpose. The computer stores the player health at a specific address and that’s all it needs to know, where it is. Storing the name would just take up space. So when you translate the code back (decompilation, not an exact process), the names are gone. And so much more stuff is gone, compilation is a lossy translation.

[This video about decompiling Lego Island](https://youtu.be/MToTEqoVv3I) might be a bit more technical, but it shows it nicely.

Anonymous 0 Comments

Open source:

This times that plus the other thing and that piece over there = 5

Closed source (compiled):

The answer is 5, what was the question?

Sure you could guess, but would it be as streamlined? Also, you wouldn’t know what else the code is hiding / doing if it doesn’t appear anywhere.

To put into context, there’s a website showing how 30 different languages go about printing “hello world” on the screen. Now imagine that for complex programs.

https://www.geeksforgeeks.org/hello-world-in-30-different-languages/

Anonymous 0 Comments

It isn’t that difficult. Executable code is like a house. To a child or someone untrained in construction, a house is a solid thing, difficult to penetrate or damage.

To someone trained, everything is in plain sight. Whatever isn’t in plain sight is hidden behind drywall.

Reverse engineering is just finding the right tool to rip off the dry wall and then looking at the mess underneath. It’s easier if you get blueprints (sourcecode), but even if you don’t have blueprints, you can still see what’s actually there (assembled binary).

Anonymous 0 Comments

I’m not sure if it’s been mentioned yet, but modern compilers have a lot of tricks they use to make the computer program run faster, so it’s not like one line of C source code will always compile to the same set of machine code (so would be easy to reverse back to the source code), instead there are all kinds of different combinations of complicated machine code that might be generated depending on what the compiler thinks the program is trying to do, so it’s not possible to completely unpick it back to the original source code.

Anonymous 0 Comments

This this something that’s a side effect of what is called **compilation**.

When someone writes source code it is written in a sort of natural language eg “if x do y”, but a computer only understands a set of very simple instructions eg “read value, set value, compare value” (to learn more look up ‘assembly language’).

So in order for source code to actually perform, it has to be translated from the higher level language into a series of the simple instructions that the computer actually knows how to perform – this translation process is **compilation** and produces files like the .exe you might click on to run a program.

If you then were to open up a compiled file, you would see a long list of simple computer instructions that cause the program to run, but is incredibly hard for a human to understand. Additionally, because a single line of source code could compile into more than one instruction, it can be hard to convert it back into the code that it was compiled from.

Though it’s not impossible! There are programs that exist which can reverse engineer, or decompile, the list of instructions and try to make a best guess as to what the original source code looked like, though it might not be an exact replica as some data gets lost in the compilation process.

Anonymous 0 Comments

Because the whole reason it is not opensource in the first place is so that you can’t rip them off and sell their work as being yours. They don’t trust that you won’t do it just because “it’s illegal” so they go the extra mile to hide it.

Anonymous 0 Comments

The source code for many commercial softwares have been leaked. The real obstacles, regardless of if you stole the source or decompiled it, are legal. There isn’t much useful you can do with it without getting sued into oblivion, and most software is pretty unremarkable anyway in terms of what it does; as in, nobody’s looking at MS Word and scratching their head as to how it does something.