How does an abstract syntax tree work?

206 views

I’ve been interested in learning what happens when keywords in code are parsed, i.e. the parser gets to the keyword “while” in python code. I think the parser then creates a node in the abstract syntax tree (I honestly don’t know what this means). What happens after that?

In: 1

6 Answers

Anonymous 0 Comments

There are three steps to compiling or interpreting code.

The compiler/interpreter must tokenize the code. That is, split it up into discrete words or other groups of symbols that have meaning (e.g. if you have x+y you want the interpreter to recognize x, +, and y as distinct tokens).

The program must then parse the code, that is, look at the tokens as they’re presented and use the grammar of the language to interpret the actual instructions.

The program finally then generates code in the target language its compiling/interpreting to.

So, let’s talk about parsing. In English, a valid sentence has to have a noun phrase and a verb phrase. a noun phrase has an optional determiner (e.g. “the”) and a noun. A verb phrase has a verb followed by a noun phrase.

Let’s take the sentence “I walked the dog.” and tokenize and parse it according to these grammar rules.

“I” is the first token. We know that, based on the grammar, it has to be part of a noun phrase. Since it is not in the list of determiners (e.g., “the”), we know it’s a noun. This means our noun phrase is “I.”

The next token should, thus, begin a verb phrase. “walked” is a verb, so that checks out. As the next part of the verb phrase there should be another noun phrase. “The” is a determiner, so a valid start to a noun phrase and “dog” is a noun, ending the verb phrase.

We could have a tree that breaks down the sentence from the full thing into a noun and verb phrase and their sub-components all the way down to the individual tokens.

Doing this with Python or any other programming language is really not any different, just the grammar is.

I would strongly recommend looking at [nand2tetris](https://www.nand2tetris.org/course), a set of projects some CS profs put together to accompany a textbook they wrote. In it you build out a simple 16 bit computer from first principles. In project ten and eleven you build a compiler for a simple java-like programming language. Project 10 has the parser portion of it.

You are viewing 1 out of 6 answers, click here to view all answers.