eli5: In bootstrapping, how is the compiler written in the other langauge such that it can then compile the original language i.e. the bootstrap compiler?

1.01K viewsOtherTechnology

So I’ve already posted this question on Stack Overflow, but I wanted to ask it here as well, since I am not sure if they will simply say it is a duplicate (even though the other answers from other questions don’t answer what I asked in a way that helps me).

[https://stackoverflow.com/questions/78539284/in-bootstrapping-how-is-the-compiler-written-in-the-other-langauge-such-that-it](https://stackoverflow.com/questions/78539284/in-bootstrapping-how-is-the-compiler-written-in-the-other-langauge-such-that-it)

So I was wondering if there were direct examples people could give of how the bootstrap compiler is actually written such that it actually represents the language you want to write in another language, because my head can’t wrap itself around it and most sources are fairly vague and just assume you know how it is done.

Hell, maybe if people try and explain it in a metaphorical way and walk me through how the bootstrap compiler is written to represent language X in language Y, that might help too, since I am not a programmer, but this concept is stressing me out from knowing I don’t understand it.

In: Technology

17 Answers

Anonymous 0 Comments

This might be a bit more ELI15 than it is ELI5, but let’s give it a shot anyway: Let’s actually look at how these things are built.

There’s this tiny toy programming language called [Brainfuck](https://en.wikipedia.org/wiki/Brainfuck). It’s almost completely useless, and named for how hard/annoying it is to write anything useful in it, but it’s an incredibly simple language to write an interpreter for. I mean “if you know what you’re doing, it’s an afternoon project” sort of simple. So, when I learn a new programming language, I like to write a Brainfuck interpreter as one of my first projects. It covers all the basics and serves as a nice tour of a language. [I have a few of them online](https://github.com/pdpi/brainfuck/), so let’s look at my Haskell version to explain this thing.

[First off](https://github.com/pdpi/brainfuck/blob/master/haskell/src/interpreter.hs#L7-L14):

“`
data Instruction = Get
| Put
| Next
| Prev
| Add Int
| Loop [Instruction] deriving (Show, Eq)

type Program = [Instruction]
“`

I’m saying two things here. One is that there are six instructions we know about (Get, Put, Next, Prev, Add, Loop), the other is that a (Brainfuck) Program is just a list of Instructions. (the `[]` mean “list of”). This is the first really important concept: At this point, a Brainfuck “program” just means some data inside our Haskell program. A more complex language would’ve had a bunch more stuff happening here, but it’s just more of the same, really.

[Then over here](https://github.com/pdpi/brainfuck/blob/master/haskell/src/parser.hs#L18-L30):

“`
parse :: String -> Program
parse src = reverse . head $ foldl’ (flip parseChar) [[]] src
“`

First line just says “we have a thing called ‘parse’ that takes a string of text and spits out our interpreter’s idea of a program”. This is the second important concept: You can just read a bunch of text and turn it into that program-like data we saw earlier (it should come as no surprise that this is called ‘[parsing](https://en.wikipedia.org/wiki/Parsing)’).

You can safely ignore all the gobbledygook in the second line, it defines how `parse` works in an incredibly terse style that only really makes sense in the context of “I wrote this as a way to learn Haskell”, but you might notice that you have the word `parseChar` in there: I defined parsing the whole program in terms of repeatedly parsing a single character at a time.

So if you keep reading:

“`
parseChar :: Char -> Parser -> Parser
parseChar ‘.’ = add Put
parseChar ‘,’ = add Get
parseChar ‘>’ = add Next
parseChar ‘<‘ = add Prev
parseChar ‘+’ = add $ Add 1
parseChar ‘-‘ = add $ Add (-1)
parseChar ‘[‘ = push
parseChar ‘]’ = pop
parseChar _ = id
“`

That line that says `parseChar ‘.’ = add Put` basically means “when you see the character ‘.’, that translates into the ‘Put’ instruction, so add that to the program”. The rest of the lines are just variations, “when you see this character, add that instruction”. If you look at [this sample](https://github.com/pdpi/brainfuck/blob/master/examples/99botles.bf), you can see that the actual program is really just made of those characters `.,<>+-[]`. That last `parseChar _ = id` line is a fancy way of saying “if you see any other character, do nothing”. Same as before, more complex languages would need a correspondingly more complex parser.

Now, this is where a compiler and an interpreter diverge. I have [this bit of code](https://github.com/pdpi/brainfuck/blob/master/haskell/src/interpreter.hs#L16-L33) that takes that internal representation of a Brainfuck program, and immediately runs that program. A compiler would, instead, take that and produce an executable, but that’s largely just writing a file with a very specific format of its own. E.g. by the time you’ve turned into assembly, an `Add 5` instruction in our brainfuck program might get translated into an `ADD RAX, 5`.

These fundamentals are the same no matter what language you use to write your interpreter, and what language you are interpreting. It’s just reading text data and making sense of it. So… some nutcase out there wrote [a brainfuck interpreter, in brainfuck](https://github.com/canoon/bfbf/blob/master/bf.bf). If I had written my interpreter as a compiler, you’d be able to compile that crazy bf-in-bf thing with my interpreter-that-we’re-pretending-is-actually-a-compiler, and you’d have bootstrapped the language. Wee!

You are viewing 1 out of 17 answers, click here to view all answers.