So I’ve already posted this question on Stack Overflow, but I wanted to ask it here as well, since I am not sure if they will simply say it is a duplicate (even though the other answers from other questions don’t answer what I asked in a way that helps me).
[https://stackoverflow.com/questions/78539284/in-bootstrapping-how-is-the-compiler-written-in-the-other-langauge-such-that-it](https://stackoverflow.com/questions/78539284/in-bootstrapping-how-is-the-compiler-written-in-the-other-langauge-such-that-it)
So I was wondering if there were direct examples people could give of how the bootstrap compiler is actually written such that it actually represents the language you want to write in another language, because my head can’t wrap itself around it and most sources are fairly vague and just assume you know how it is done.
Hell, maybe if people try and explain it in a metaphorical way and walk me through how the bootstrap compiler is written to represent language X in language Y, that might help too, since I am not a programmer, but this concept is stressing me out from knowing I don’t understand it.
In: Technology
> I am not a programmer
You’re going to have difficulty getting non-vague answers you understand to *a lot* of programming questions, especially tricky ones.
Compiler design has a reputation for being one of the most tricky programming topics you’re likely to see. (More tricky stuff exists, but tends to be more niche.)
> such that it actually represents the language you want to write in another language
Source code is bytes in a file. A compiled program is also bytes in a file.
A compiler reads the input bytes from the source code file, processes them, and writes some bytes to the output file.
A compiler is basically a very complicated string processing program.
> how the bootstrap compiler is actually written
There’s nothing special about the bootstrap compiler, it’s written like any other compiler. It just happens to be written in a different language than the code it’s compiling.
Let me ask you this first: Why do you think the language the compiler is written in, is relevant?
You have a specification for what a piece of code should do in your language.
You have some target language, probably assembly.
You need to transform the first to the second. You’ll do it by hand at first, for the smallest pieces of code. Then eventually you have an exact plan for how a whole program is translated. And then you implement that exact description in some language. This is now a compiler.
Reasons for implementing the compiler in the same language is for other reasons. It sounds cool. They enjoy working with their new language. Compilers are unusual software so you find lots of bugs in the compiler while you build it.
For a more visual example, imagine the first compiler is a 3d printer.
You decide you want another 3d printer(second compiler) that works with different materials(language).
You 3d print all the parts for the second printer using the first. The second printer is built using these parts.
This can now print using the new materials with no dependency at all on the original printer.
Think of compilers as translators. They take in a bunch of text (the source code) and spit out a bunch of low level computer instructions (machine code).
Imagine you speak English and you visit a foreign land where all the locals speak only French. You have a choice of three translators…one natively speaks French and learned English. One natively speaks English and learned French. The third speaks native Hindi but learned both English and French. All three are fluent, they’ll all give the same correct translation. Should you care which translator you use? The internal processes of the three are very different but they’re taking exactly the same input and producing exactly the same output.
Compilers are translators. They convert a program in a source language to a target language. The compiler itself is a program, written in some language.
Suppose that you already have a compiler for language A, going to assembly code which the computer does understand. You can write any program in language A, compile it to assembly, and run it. Cool.
We need a compiler for a new language called B. There is no compiler for language B going to assembly code. We can’t run programs written in B.
What we’ll do is use A to write a basic compiler for B. It doesn’t need all the features, just the bare minimum to be useful. Great, now we can compile simple programs written in B to assembly.
Here’s the bootstrapping “trick”: we can now write a program in B that translates from B to assembly code. We can use the compiler we just wrote to translate it into assembly! And now we have a compiler, written in B, that compiles B to assembly!
It’s all about chaining compilers to create new programs to translate from one language to another.
Computers are designed to process from a few dozen to several hundred “instructions” directly. Something like “load value 10 into register X”. This instruction is represented as numbers in a file; it could be “43 12 10” as decimal numbers. The file would not be text like this but “binary”, or a format the computer can read directly; not particularly friendly for people to write. To make things easier we write a program to take a text version of this, for example “ldx #10” and write out the binary file. This is a tremendous improvement and was done starting in the 1950s. As nice as this is we wanted to do better, to write statements like “num_parts = 10”. A compiler knows how to translate that into the text version of the machine code the we run THAT translator on that file.
So, way back when, someone wrote the first simple compiler in the text version of the machine code, probably for a minimal subset of their language, then used that mini language to implement a compiler for the full language. This process is known as “bootstrapping” a compiler.
Of course now we would just start with an existing language to create the initial version of your target language.
> Like, how is the compiler for the lanuage you want to write actually represented in the compiler in the other language?
Compiler is a program that reads files and outputs executable programs. So you take any programming language that you’re proficient in and create a program that would read text files containing the code in NewLanguage™ and translate that code into something that a computer can execute.
> Is it literally writing the compiler in the language you want translated directly into the other language, or is this compiler in the other language instructions on how to translate what it says into the compiler language you want?
You can do any approach here. Your program in ExistingLanguage™ (for example, C++) can read the code of NewLanguage™ and translate it into the machine code of ARM64 processor. And you can run it then. Or it can translate that source to code in ExistingLanguage™ and compile it as any other ExistingLanguage™ program using their compilers. Or it can translate it to a completely different intermediary language that gets compiled or executed by some third tool.
Either way compiler is just a tool that translates code into different code and it doesn’t matter what language you use to accomplish that. The compiler language itself doesn’t have to understand the new language being compiled. That logic must be created by the creator of the compiler.
What language a compiler is written in has no bearing to what language it can compile. It’s a separate goal to write a compiler which can compile itself because it’s a great validation of a programming language you are implementing a compiler for, but it’s not really required, a compiler works the same no matter what language it’s written in.
Latest Answers