Why we can see the source code for an HTML page but not for a program, unless the programmer (or the owner) shares it?

1.79K views

Why we can see the source code for an HTML page but not for a program, unless the programmer (or the owner) shares it?

In: Technology

9 Answers

Anonymous 0 Comments

A program is run through a compiler when the code is finished, which converts the code into instructions for the processor, 1s and 0s basically, and the result thrown into an “.exe” file.

Html is interpreted, which means that other software runs through code statements as they are so they don’t get lost in the compilation process.

Anonymous 0 Comments

The code in an executable isn’t written in characters but in plain bits/bytes the CPU can use “natively” (consider reading what a compiler does)

In addition to this, you wouldn’t want everyone to be able to steal your code

Anonymous 0 Comments

Part of it is just semantics.

“Source code” for an HTML page isn’t the same as source code for an application program — the HTML code is more like raw data. Or, you can see Java source code, but you can’t see the Java runtime interpreter; again, the former is input data to the latter.

Another reason could be that the source code for an HTML page happens to be human-readable — it’s not converted to some intermediate machine-readable form. Compare to “source code” for a .bmp file or .pdf file, which is not. Still gets transmitted as text, but not human-readable text.

Anonymous 0 Comments

HTML code has to be broadcasted so that your computer can read the website. Program code doesn’t have to be broadcasted because A, apps are localized and everything runs internally. Or B, the program was written in a way that can be read by a system app without giving away the code. All that being said, you can always get the source code if you try hard enough. It’s called reverse engineering and it’s mostly frowned upon.

Anonymous 0 Comments

There are lots of programs you can see the code for, php, python, Visual Basic, JavaScript, etc.

Anonymous 0 Comments

HTML page isn’t the executed code. it’s the data that’s being read and rendered by the browser.

on other hand, Javascript IS code. but it’s sent in clear text that’s then interpreted by the browser and then converted into executable binary code on the fly and then executed

Anonymous 0 Comments

So the other answers aren’t wrong, but I feel they leave a bit out.

In computing, there are a few different ways to program. These are typically referred to as being low-level (closer to the native 1’s and 0’s that computers understand) and high-level (closer to a human-readable language)

At the lowest level we have assembly. This is some of the most bare-bones code, where instructions are written out line by line for what each part of the computer does at each step of the code; what values in memory get moved into what part of the processor and what operation is done and where the output goes, *really* low-level stuff.

A level above that and we have the *compiled languages*; these are languages like the C/C++/C# family, along with a few others that are largely lost to time. When you code in C or C++, you get some advantages like being able to automate a lot of the things you would have to do manually in Assembly, like cleaning up memory after you were done using it, and the code, while it still isn’t any natural language, is more understandable to the average person; it’s kind of like if you are fluent in Spanish and you try to read Portugese: some words are different and you might not understand some context in some places, but you can discern the general gist if you understand some basic concepts.

These are called compiled languages because the text file that is your source code has to be run through a program called a compiler, which uses a set of rules to convert that code you wrote into raw machine code, and packages it so it can be run as a .exe

Then a level up, we have interpreted languages – these are languages like Python and Ruby, and they do for C/C++/C# what those languages did for Assembly: make it easier to read and write more complex programs in less human hours.

These are run through a program called an interpreter, and unlike with a compiler, the interpreter has to be called each time the program file is run. So unlike in C variants, where the compiler spits out a .exe, a python file requires being sent into a python interpreter to give any outputs (at least generally; there are ways to get a .exe out of it, but it generally takes longer to do, as fundamentally the programs end up being bigger).

Finally you have scripting languages; these are sometimes considered not “real” languages, but the idea is that you have a program (like a web browser) that looks at code fed to it and then follows it like a script using its interface. These tend to be more specialized than any other type of programs, and it’s used as a way to share content without allowing random untrusted code to run directly with your user permissions (not that that always works, but…) – so for instance a web browser’s scripting languages (html, javascript, css, etc;) lets you call predetermined elements from top to bottom of a page which the web browser then displays as best as it can, calling the images and whatnot to use during runtime.

Because scripting languages and interpreted languages are taking the source code and reading it during the runtime of the program, as opposed to putting out an independent program to be run on its own, it needs to be able to “see” that code, and if you have access to the program that means you can access it.

Compared to compiled programs which aren’t easy to reverse from bytes to readable C (things like variable names get erased during reverse-engineering and makes it hard to track what the program is doing, as the computer discards that during compiling; it’s there for human readability).

Because interpreted languages and scripting languages fundamentally

Anonymous 0 Comments

This is because HTML is a language that has been modified for decades to ensure safety. Its purpose is to provide formatting information to a browser, which renders the html into a visible design. Javascript is a full-on programming language that was introduced to browsers because it allows for things like posting requests and performing complicated functions. It is severely restricted in its capabilities to prevent malicious web pages from performing dangerous functions on a user’s computer.

Depending on the language a program was written in, it can be turned back into an exact (or close enough) copy of the original code. This is mostly possible with languages that use just-in-time compilers. The code presented to the executing-program is high-level enough that it can often be passed through the same program that generated it in reverse and yield results close to the source code.

At the end of the day, a computer only executes machine code, a binary expression of processor specific instructions. Programming languages may either turn a file into machine code to be executed, or translate a program in an intermediate language into machine code as it goes, basically interpreting a block of language, translating it into bytecode, and executing each step of the program.

In languages that are compiled in the traditional sense, it is often the complex nature of optimization functions that makes recomposing machine code into higher-level functions difficult or outright impossible. A binary can absolutely be turned back into something that resembles human-written code, but typically compilers do so much to make code run faster that the resulting machine code simply cannot be associated with a specific higher-level operation.

It is possible to read machine code directly (as bytecode or assembly), and unlike decompilation this code will always provide a true and exact representation of the operations happening in a program. Actually analyzing this code is difficult at first but gaining such a skill is incredibly valuable for malware analysts and software exploitation, among other things.

Anonymous 0 Comments

There are different ways of telling a computer what to do.

The most efficient way would be to simply tell the computer what to do in its native language.

This is efficient and fast and also incredibly hard because humans aren’t really good at speaking ‘computer’.

Another way would be to write down what you want the computer to do in a made up language that sort of is a bit like native computer language but much easier for humans to understand. That language can be easily and (this is the important part) unambiguously translated into the computer language.

You write your instructions down in this human readable language and have program translate it into computer talk.

There are two main ways to handle that. You translate the human readable part once and just hand the computer the translated instructions. The other way is to keep it in human readable form and have it be translated on the go whenever the instructions need to be followed.

There a re hybrid ways where you don’t translate the human readable part directly into the computer readable part but something in between and have that be translated for the computer when needed.

Of the above the computers native language is called machine code. It tells the computer in the exact terms hardwired into its cpu what to do.

There is assembler which can be thought of as the computer’s native language with some human readable pronunciation marks added. It is machine code with human readable labels added.

Writing instructions in a programming language means that you write it in something that is designed for humans to be able to read that can be translated unambiguously into machine code.

If the program is translated once and the computer is give the result, you have what most traditional programs on your computer are. Programs that your computer can handle, but that a human would not be able to read, because you only have the translation for the computer not human readable original version.

There are ways to translate it back the other way but those have many faults and usually don’t return very helpful results. Imagine a google translate that works from english to smurf. It is easy to do one way but in the other direction you often miss a lot of things that make it easy to understand without looking at the context.

This is why it is important to have the original human-readable program code, to understand what a program does.

Another helpful effect of having th program code is that you can translate it for other computers that have a slightly different native language when needed.

The next thing I described is happening with scripting languages. You have program code and then translate it into the computers tongue whenever needed. it is sort of the difference between a translator translating a written work and an interpreter translating a speech while it is given.

This approach is obviously not as efficient, since you need to do the translation over and over again every time you need it, but it is useful when you are often reworking the code for example.

Hybrids between the compiled once and interpreted over and over again approach exist, for example with Java.

So I hope you can see that for some computer programs you need the creators original code in the human readable language to make sense of.

HTML is a bit of a different thing. In theory you might want to lump it in with other scripting languages that get interpreted on the fly whenever they are needed and it sort of is like that.

But at its core HTML is not really a programming language. It is a markup language. It is more like a word-file than a program.

It originally was meant to mostly involve text with some marks that tell the computer which parts of the texts to bold and italicize and which parts to treat as a hyperlink to a different document. The other stuff we have today was added later.

Now it is possible to create documents with markup in a way that is not easily readable to humans. Older versions of Microsoft Office save their documents in such formats while newer ones save them in a format that is relative of HTML and at least in theory readable to humans.

The creator of the web (Tim Berners-Lee) crated the HTML we know today in the human-readable and editable format instead of creating html files that could only be edited by programs who knew the code. He was working with open source tools and based on ideas by others who also used the human readable approach. He was working for CERN at the time and not trying to create something that once could sell and make money with so the benefit of doing it the other way wasn’t really there for him.

Today we still have (in theory) human readable HTML pages even if most are created by computer for computers with few humans ever seeing the code themselves.

At this point we keep doing it out of tradition and a desire for backwards compatibility.