Why is it so difficult to copy source code that is not “open source”?

825 views

It’s been in my mind if we are using the software/program or even hardware of a tech company, we can play around, install-unsinstall and more. Then how is it so difficult for someone to “unhide” the source code that the device uses? Technically the code is in the device somewhere hidden in it, so it’s there, but still, it’s almost impossible to obtain the source code. How do they achieve this so no one copies their code?

In: 366

42 Answers

Anonymous 0 Comments

Many people are saying source code is different than compiled executable, but they’ve forgotten about .Net. Java should be the same too, but I’ll talk about .Net.

.Net is not compiled to assembly, but instead an intermediary CLR Common Language Runtime. This code is executed by the .Net runtime.

So you can very well decompile code with a product like JetBrains DotPeek, but what you get out of it depends on how it was compiled. CLR is still fairly readable, and just as easily converted back to C#.

A proper Release build would enable optimizations, where the compiler will minify code, renaming variables and changing many human-readable blocks to obtuse Boolean statements. Code comments are also lost.

But can you reverse engineer this code and understand it? Yes, with enough time. But you can usually make any small tweaks you want, and recompile, in mere minutes.

Now enters Obfuscators. These compiling tools really mix up code, moving around logic blocks, variable names, and so on. Their purpose is to make code reverse-engineering more difficult. Nonetheless, you can still decompile them, make any small change if you can manage to figure it out, and then recompile.

Now enters code signing, you’ve create a new DLL but you lack the signing keys of the original so you can’t just drop it in under an exe, it demands trust. So you just recompile the exe itself, and run a stack of bootleg compilations.

Anyways, go get JetBrains DotPeek and decompile some of your video game files, you’ll be surprised what you get. Now if the game was written in C/C++ there’s no hope. We’re talking about .Net C#, but that’s most Unity games.

Anonymous 0 Comments

Programmers write in “high-level” languages where one step does a lot. Computers work in very low-level languages, where one step does a tiny thing. There’s software that translates what the programmers said into a program the computer can understand. And that program is what’s actually on the computer, not what the programmer originally wrote.

By analogy, let’s say I told you “put your shoes on, go down to Jimmy’s house and put this letter in his mailbox, then come back” — that might be the “source code.” The “object code” would be closer to “look down. Is your shoe there? If it is, then go to the part of the instructions where we tell you to put it on. Look to the left. Is your shoe there? If it is…. [60 pages later] bend your left knee. move your left leg forward. Extend your left knee. Bend your right knee….. Look to the left. Are you at Jimmy’s House? If not, then go back to the left knee part.” You might be able to look at the first part and realize “Oh, this is just looking for your shoes.” But, that process takes a lot of time and thinking — there’s software that can help with it, but it’s still mostly a manual process.

Anonymous 0 Comments

It’s compiled down to a binary file (executable) before it’s published to your computer. Your computer only has the final product.

To get the source, you would have to convert the binary back to assembly code, then convert the assembly code back to the high level language from whence it came (like C), keeping in mind that the compiler may have optimized the original source code in the first place. You can do this with a publicly available product called Ghidra, but since it doesn’t know variable names, you’d have to sift through the decompiled code and rename all the variables and rebuild all the structs as you begin to understand it.

It’s fun.

Anonymous 0 Comments

There have been some really good analogies but I want to give a bit more insight… might be slightly ELI10:

Code is just human readable instructions. But the process of “compiling” code is converting those human readable instructions into machine readable instructions. And it’s really tough to understand those machine readable instructions because it’s just 0s and 1s.

So if I hand you a file that is a long list of 0s and 1s, you can use it because your *machine* understands it. But for you to figure it out would take a lot of time and effort (though possible to a degree)

Anonymous 0 Comments

Compiled software is in Assembly or Machine Code, and is not easily readable for humans. Reverse engineers have to decipher it from what it is doing.

Examples:

Assembly:

`x3000 LD R1, x006`

`x3001 LDR R2, R1, #0`

`x3002 BRz x005`

Machine Code:

`0011 0000 0000 0000`

`X3000 0010 001 0 0000 0110`

`x3001 0110 010 001 000000`

`x3002 0000 010 0 0000 0101`

Anonymous 0 Comments

ELI5:

Because if I know the answer is 4, I don’t know if you got there by adding 2 + 2 or 1 + 3.

Open source software doesn’t tell you the answer. It tells you the equation. 2 + 2 = 4.

Closed source software tells you the answer, with no equation. Answer = 4.

The reason to look at source code is to see the equations, to see how they got to the answer. In closed source, you can’t do that.

Edited to add: A program that the computer runs is made up of “answers” (in this analogy). Therefore the answers are what are needed by your computer. It doesn’t care about the equations used to get there.

Anonymous 0 Comments

The coding the programmer does is not word for word what goes to the computer. The source files are basically compiled into another language, which is then served to the consumer.

A very broken down way for me to think of this is like summarization. When you summarize a story, you are omitting a lot of details in an attempt to gather the ‘bigger picture’ or the core of what’s actually in the text and putting it in to your words. If you give this summarization to somebody without them reading the full text, they will not be able to recreate a word for word copy of the original text because the summarization uses different wording and has omitted a lot of details. Their attempt at a recreation of the source material will typically either have things wrong or straight-up missing.

Anonymous 0 Comments

The source code will not exist in a compiled binary. Instead it will be machine code. Machine code can be understood but it takes too much time and energy. Optimized machine code is even more difficult to understand. Code can be obfuscated to make it even harder.

Anonymous 0 Comments

it’s not that

> no one copies their code

it’s just difficult. computer operates in binary. so the code (called machine code) is just a loooong stream of 1s an 0s. first step in converting it back to readable form (source code) is disassembly. and the most difficult step is to convert the assembly dump into higher level language – with functions, procedures etc. you generally have to wrap your head around it all, and (in a way) rediscover how to put all this assembly mess back together into a coherent high-level language from. such process is called [reverse engineering](https://en.wikipedia.org/wiki/Reverse_engineering) – meaning getting the recipe/algorithm/way something works out of the results of work of such piece of code (or device, because hardware reverse engineering is also a thing). take a look at this:

cli
push cs
pop ds
mov ax, word ptr [0x4c]
mov word ptr [0x7cd3], ax
mov ax, word ptr [0x4e]
mov word ptr [0x7cd5], ax
mov al, byte ptr [0x46e]
mov byte ptr [0x7dbd], al
mov ax, word ptr [0x413]
dec ax
mov word ptr [0x413], ax
mov cl, 6
shl ax, cl
sub ax, 0x7c0
mov word ptr [0x4e], ax
mov word ptr [0x26], ax
mov word ptr [0x4c], 0x7c82
mov word ptr [0x24], 0x7d62
mov si, 0x7c00
mov di, si
mov es, ax
mov cx, 0x100
cld
rep movsw word ptr es:[di], word ptr [si]
int 0x19
cmp ah, 0xaa
jne 0x4a
iret
cmp ah, 2
jne 0x94
cmp cx, 1
jne 0x94
cmp dh, 0
jne 0x94
push ax
push bx
push si
push di
pushf
lcall cs:[0x7cd3]
jae 0x67
jmp 0x9d
cmp word ptr es:[bx + 0x1fe], 0xaa55
je 0x72
jmp 0x99
cmp byte ptr es:[bx + 0x1bc], 0xc9
je 0xf4
call 0x108
call 0xb1
mov si, bx
cmp dl, 0x79
ja 0xa5
add si, 2
mov di, 0x7c02
mov cx, 0x1e
xor dh, dh
jmp 0xc4
ljmp 0xf000:0xb648
mov ax, 1
clc
pop di
pop si
pop bx
inc sp
inc sp
retf 2
add si, 0x1be
mov di, 0x7dbe
mov cx, 0x20
jmp 0xc4
mov ax, 0x301
pushf
lcall cs:[0x7cd3]
jae 0xc3
pop bx
mov cl, 1
xor dh, dh
jmp 0x99
ret
push ds
push es
pop ds
push cs
pop es
cld
rep movsw word ptr es:[di], word ptr [si]
mov cx, 1
mov bx, 0x7c00
mov ax, 0x301
pushf
lcall cs:[0x7cd3]
jb 0xcc
push ds
pop es
inc byte ptr cs:[0x7dbd]
pop ds
jmp 0x99
add ax, 0x714
sbb al, 1
or al, 0x75
push ss
sbb ax, 0x1610
push ds

this is a result of disassembling some 400-something bytes of binary machine code (actually, a boot sector virus). you’d have to understand how it all works, and what each line does precisely, in order to be able to reconstruct it in higher level language. now, imagine doing the same with code that has several megabytes, or maybe more. of course, there are tools that help, but still software reverse enigneering is one of the most hardcore things you can do in IT. most diffcult and most mundane at the same time. the example above is only four hundred-something bytes!

Anonymous 0 Comments

The source is all written out nice and easy for a human to read, then the computer compiles it into something useful for computers, the executable.

You can copy the executable all day, no problem and it should run fine unless some extra protections were put on it, like some encryption based on a processor id, a dongle, or something of that nature.

You can’t see the source unless you de-compile it, like have the computer run the compile process backwards. That will work kind of, but you won’t have all the notes and pretty structure it originally had. You see this when anti-virus researchers decompile a virus to see what makes it tick. You can find out a lot, but it is harder.

Most open source things, people just copy, install, run the binary executable. That’s fine. But with most open source licenses, to give it to someone else, you have to include the source code along with the binary so anyone can go in, read it, change it, and recompile it to their own needs. It is the license, the permission to use it, that makes this easy.

Commercial software for a long time has like encrypted the source code in the binary, purposely obfuscating it to make it harder to copy or de-compile, since they see their value as in that source code. You see this in the warez and game cracking world, it becomes a cat and mouse game. The software company does what they can to make it hard to use unless they pay them, and people do what they can to get around that. Since people are always trying to crack, the methods become increasingly complicated, along with the cracking methods, so it becomes its own little sub-culture.