What does the code that makes up programming languages look like?

1.13K views

Take a language like Java. How was it originally created? I can’t wrap my head around how someone invented a computer language to run without having some “prior” language that it allows the first lines to function. Is it just Java all the way down, like someone wrote a single line of Java and then every other line was built on that?

What about the first computer language? What was the basis that that functioned on?

Thanks for any help, I hope that was phrased in a mildly intelligible way.

Edit; I’m trying to think of it like human language: at some point there was a first “word” spoken by someone and understood by another and from there the structure started to be born. What were the first “words” on a computer that led to where we are now?

In: Technology

36 Answers

Anonymous 0 Comments

The very first “languages” weren’t anything you would consider to be a language. For example, index cards with holes punched in them to represent 0 or 1. A “programmer” would write 0001001001110101010 in a very precise way for that particular computer to move data around in memory, perform simple operations on it (ex: add, substract), or what have you. They were literally manipulating individual bits on the hardware.

The truth is, all computers still actually work this way! Down at the level of your actual CPU, the only language it can understand is still 0010010100101010. But what has happened since is what’s called “abstraction.”

For example, someone invents another language around the binary that contains larger “ideas” that would normally be expressed as longer binary. Instead of 0010101010010101, maybe the language defines it as “move 0010 to register 4” or “add the contents of register 2 and 6.” They write in that language, but the language is then translated back into its original 00101010 for the CPU to actually run.

Then someone else comes along and goes “hey so, having to remember all these registers and where things are in memory is a huge pain in the ass. What if we let people just define simple variables (x = 4, y = 3, z = x+y) and then we keep track of where everything is for them, and first we translate into “move 0010 to register 4” and then we translate THAT into 00101010 for the CPU to actually run.

Keep doing this 30-40 times over half a century, and you get to the modern languages we use today. But realize *nothing* *can actually run these languages as written*. They have no meaning to even a modern CPU. They have to be parsed, lexed, compiled, and assembled all the way down (through multiple intermediate steps) back into the same 00001010101010 that people were printing on punch cards 60 years ago.

Modern languages are really just window dressing. No matter what you write in, it all gets compiled/baked down into the same 0s and 1s that your particular CPU needs to run, because that’s all it can run. Languages are just layers of shortcuts and decorative fluff that we’ve build up on top. And all of the arguments over modern languages are mostly about the tradeoffs of one cosmetic change or shortcut or another. The CPU doesn’t care.

Anonymous 0 Comments

It is hard to wrap the head around it because there are layers and layers and layers of abstraction. When taken one at a time, each one of them is much easier to understand, at least on the level of ELI5.

I highly recommend this youtube series: https://youtube.com/playlist?list=PL8dPuuaLjXtNlUrzyH5r6jN9ulIgZBpdo They explain everything more or less like eli5, starting from the bottom and going up that ladder of abstractions. Very entertaining, too.

Anonymous 0 Comments

The CPU only talks in numbers. Every number is an instruction. You have to know the “codebook” of what number, in what order, does what. Instruction 43 might be “multiply these two numbers”, for instance. So 43 2 3 gives an answer 6.

That number is literally represented in binary on the CPU’s input pins when you want it to do something. You set those pins, they correspond to the binary for 43, and the CPU knows what to do next.

But that codebook is a pain to program in. Ask the Apollo astronauts who had to calculate their trajectories, etc. by setting “Verb 23, Noun 15” on their computer system to get it to do, say, a basic multiplication instruction. That’s all they had to communicate. Numbers. That’s “machine code”.

But those numbers were assigned those tasks by the human CPU designer. So somewhere there’s a codebook that tells you that 43 is multiply, for instance. So… why not let the human just use a shorthand. Say, “MUL”. And computers are designed to do all the boring legwork, that’s their entire purpose, so why not get the computer to take the text “MUL” and output “43”? Congratulations, you just made an assembler. A program that takes the text instructions and converts them to machine code.

The first one would have been written in “machine code” by someone. Tricky, but you only had to write the assembler and then everything got easier. You obviously don’t sit and write all your programs in machine code once you have a working assembler.

But even “assembly language” (the codebook language that has “MUL”) is a bit tricky to program in. So you make the computer do the work again. Using assembly language, you make a program that takes more complex text, and converts it into assembly language for you. So it might take something like “A = B * C”. And it works out that it has to get B and C from memory, run the MUL instruction on them, and put the result into some part of memory called A. That program that does that might be called a compiler.

There is a programming language called C. Generally this is the first compiler that anyone writes for a new type of computer, because it’s relatively easy to write a C compiler in assembly language, and relatively easy for a human to write a program in C. That C compiler takes your C code (which looks a lot like Java) and converts it to simply assembler or machine code.

Now that you have a C compiler you find that you can compile most things! Parts of Java itself are written in C.

So when you have an entirely new type of machine, someone (who knows the “codebook” well) writes an assembler for it. The next (or even the same!) person then finds or makes a C compiler that itself can be written in assembler (there are lots of them already, but sometimes we have to make new ones!). Then the person after that? They have a C compiler and a whole raft of operating systems, kernels, programming languages applications, etc. that are already written in C.

Notice, though, that all it took was two programs – some way to get assembly language into the computer, and then some way to convert C code down to assembler. Those are probably the most difficult types of programs to write, and sometimes you have to write parts of them from scratch (e.g. if a chip has never been seen before and is different to everything that existed before), but the assembler you can literally write by hand, and the C compiler is just a case of tweaking an existing C compiler. And only those two programs are needed (a slight exaggeration, but once you have them, everything else can be compiled from C!) to get everything else working.

Computers in the old days (e.g. Apollo missions, and home computers right into the 80’s) used machine code only and often had machine code tutorials in their “starter manuals”. One of the programs they often got you to write was an assembler! And then C and other languages came later. Nowadays nobody bothers because it’s all done for you, but still someone, somewhere, sometimes has to write an assembler (or modify an existing one to work on a new chip) or a compiler.

It’s like having to learn the alphabet, then learning how to form that into words, then words into sentences, then sentences into paragraphs, then paragraphs into chapters, and so on. It all starts at the bottom. And I could probably teach someone who knew nothing about computers how to write a very basic program in machine code, and a basic assembler to help them, in a few days. It would take far longer to write a C compiler but you probably wouldn’t need to.

Even today, when starting on a new chip, the people at the chip manufacturers will do that part for you – and they often start with the machine code to write an assembler, then using that assembler to compile a “miniature” C compiler (e.g. tcc), then that mini-C compiler to compile a full compiler (e.g. gcc), and then give that to you in a download so that nobody ever has to do that part again and we can all just take C code and compile straight to the new chip without having to have anything to do with machine code or assembler.

Anonymous 0 Comments

Computers are built with a language in hardware. Put a value in the part telling it what to do, put a memory location to tell it what number to do that command to.

Command 1: load accumulator with value at location 1000. Command 2: add to it the value of location 2000.

Then you write a whole program like that. Next step is to use that to write an assembler to make this easier. Instead of punching numbers into locations and executing, you say LDA 1000 and ADC 2000. The assembler program is just easier to remember machine language, and more complex ones do variables and such.

Then you use the assembler to write a compiler for a higher level language like C to make it even easier to program.

And then you use C to write Java.

Anonymous 0 Comments

Lots of them are written in C, or (like java) run in VMs written in C or C++. The C/C++ compilers (GCC, Clang, etc) are also written in C and C++ (i.e., self-bootstrapping.) The early C compilers were bootstrapped with a minimal version in a hardware-dependent assembly language that could compile the C version. C is still heavily used.

Anonymous 0 Comments

It’s a chain.

The first program was very basic, and could only understand on and off. But a human could use it in a clever way to make a second program which could understand a set of different on and off settings in a pattern, like when you see someone hold up two fingers instead of one, and that means something.

The next program after that could understand where one pattern like that ended and where another began. At this point, it’s like words, only more basic – just lots of “on” and “off.” But see, here it gets useful, because these are like symbols that the computer understands. It can move electricity in patterns defined by these sets of on-off.

At this point programs were just sets of cards with holes in them that a computer would read. There were no monitors or keyboards.

Well, that program was used to make a new program that not only understood these patterns, but could show a human on a screen letters and numbers that represented those patterns. And with those letters and numbers – stuff like “PUSH 1” and so on – it was a lot easier to write more complicated programs.

This is where it really took off. Now people were telling the computer how to understand even more words and turn those into long, long sets of on-off that it could use to direct electricity in _really complicated_ ways.

These “compilers” are what form the basis of all programming languages today. Programs that teach computers how to understand other programs.

Sorry if that was too long.

Anonymous 0 Comments

You don’t need a computer to create a programming language. A programming language is at its most basic just a set of rules defining the syntax and behaviour of the code.

What you need the computer for is the write the compiler (and/or in the case of an interpreted language, the interpreter/runtime environment) and libraries/API. Once you have the compiler, the libraries *can* be written in the same language, but often they are written in an another language (usually a lower-level language or even partly assembly).

Compilers/Interpreters are very often written in the same language they compile (self-hosting), but obviously that’s a cyclic dependency – the first/earliest versions would have been written in prior existing programming languages.

Taking Java as an example: the official Java API is mostly written in Java, with parts of it in C or C++. The official compiler (Javac) which converts Java source code to Bytecode, is today written in Java but originally was written in C and C++. The official Java Virtual Machine (JVM) which runs the bytecode on the various platforms that Java supports, as far as I know is always written in C. Note that I keep saying official – anyone is able to write their own version. Oracle’s licensing shenanigans aside, nothing stops you writing the JVM in C#, or Javac in JavaScript.

C++ is a special case; the first C++ compiler was written in C++, but my understanding is there were some ugly hacks and conversions to actually get it compiled in a C compiler. The first C compiler originally evolved from a language called BCPL. Go up the chain far enough, and eventually you reach someone who was programming a computer by writing assembly language, pulling wires, flipping switches, or typing on a punch card.

Anonymous 0 Comments

There’s two important ideas you’re hitting here, and I want to talk about both of them.

First, there’s history. Each early language was usually made the same way: lazy programmers. Ugh, who wants to write all this binary when I could make a program that converts words to the binary patterns I want for me! <fast forward a few years> Ugh, who wants to write all these assembly instructions when I could just write a program that converts higher level abstract code into assembly for me!

Then later: Ugh, people keep making mistakes in this code- I’ll write a program that forces people to follow good patterns, a ‘computer language’. This is followed by decades of research into the very concept of ‘computer languages’ and some wonderfully rich and interesting history. Each compiler was written in some lower language compiler, typically.

But the second concept is, I think, more interesting: self-hosted compilers. Rust (a pretty new language) had a compiler originally that was written in OCaml (a very hard language to learn, believe me). Then one day, they wrote a Rust Compiler in Rust, used the OCalm compiler to make that program, and then threw away the OCalm compiler.

The Rust compiler is written in Rust. When they improve the compiler, they use the old one to make the new one.

There’s no magic here. Any programming language that is ‘Turing Complete’ can compute any function that any other language can. Compiling is just a function from text (code) to a binary (executable program). In theory, you can write any compiler in any language, including itself.

Source: I knew this Computer Science degree would come in handy someday.

Anonymous 0 Comments

I’m a third year computer engineering major, so the following is all my personal dumbed down view of the answer to your question.

The basic hierarchy looks like this, from highest level (farthest from hardware) to lowest level (closest to the hardware):

– object oriented languages like Java, C++

– C

– Assembly language

– Machine code

Java compiles into C, C compiles into assembly langauge, assembly language is turned into machine code. These different levels of abstraction generally come from developments in programming over time. You can Google image search each of those names to see examples of them. As you get closer to the hardware, things generally get simpler in complexity but much more tedious. Generally, the farther you get from the hardware, the easier it is to quickly write powerful programs that do a lot of stuff.

In assembly language, doing something like taking an average of 10 numbers then adding that to a running total would probably take more than 100 lines of code (guessing). In C and Java it would take 1 line. In Java it might even happen without your knowledge when invoking some operator or function. I’m not familiar with Java since I mostly use C.

Anonymous 0 Comments

[removed]