eli5: In bootstrapping, how is the compiler written in the other langauge such that it can then compile the original language i.e. the bootstrap compiler?

1.03K viewsOtherTechnology

So I’ve already posted this question on Stack Overflow, but I wanted to ask it here as well, since I am not sure if they will simply say it is a duplicate (even though the other answers from other questions don’t answer what I asked in a way that helps me).

[https://stackoverflow.com/questions/78539284/in-bootstrapping-how-is-the-compiler-written-in-the-other-langauge-such-that-it](https://stackoverflow.com/questions/78539284/in-bootstrapping-how-is-the-compiler-written-in-the-other-langauge-such-that-it)

So I was wondering if there were direct examples people could give of how the bootstrap compiler is actually written such that it actually represents the language you want to write in another language, because my head can’t wrap itself around it and most sources are fairly vague and just assume you know how it is done.

Hell, maybe if people try and explain it in a metaphorical way and walk me through how the bootstrap compiler is written to represent language X in language Y, that might help too, since I am not a programmer, but this concept is stressing me out from knowing I don’t understand it.

In: Technology

17 Answers

Anonymous 0 Comments

I think this is more a question about how a compiler works in general.

A compiler takes some text input (the program you wrote) and turns it into machine code.

Let’s make up a simple programming language, that only contains two possible instructions. “Add number1 number2” and “Subtract number1 number2”.

Let’s use an example program in our made up language, “Subtract 4 2”

The first step splits the code up into tokens, in this case we would have three tokens, [“Subtract”, 4, 2]. A compiler for this language could then look something like this:

“`
token = tokenize(input)
if(token[0] == “Add”):
//output machine instruction “ADD token[1] token[2]”
else if(token[0] == “Subtract”):
//output machine instruction “SUB token[1] token[2]”

“`
This compiler can take the instructions from our new programming language and output the correct machine code. You might notice that it doesn’t actually matter which language you use to write this compiler. You could use C, Python, Java, whatever you want.

This is obviously very oversimplified, a real compiler has a lot more steps, because real programming languages are a lot more complex than my example.

You are viewing 1 out of 17 answers, click here to view all answers.