How does an abstract syntax tree work?

422 viewsJanuary 1, 2024

Question 100.55K March 13, 2023 0 Comments

I’ve been interested in learning what happens when keywords in code are parsed, i.e. the parser gets to the keyword “while” in python code. I think the parser then creates a node in the abstract syntax tree (I honestly don’t know what this means). What happens after that?

In: 1

6 Answers

Anonymous Posted March 13, 2023 0 Comments

There are three steps to compiling or interpreting code.

The compiler/interpreter must tokenize the code. That is, split it up into discrete words or other groups of symbols that have meaning (e.g. if you have x+y you want the interpreter to recognize x, +, and y as distinct tokens).

The program must then parse the code, that is, look at the tokens as they’re presented and use the grammar of the language to interpret the actual instructions.

The program finally then generates code in the target language its compiling/interpreting to.

So, let’s talk about parsing. In English, a valid sentence has to have a noun phrase and a verb phrase. a noun phrase has an optional determiner (e.g. “the”) and a noun. A verb phrase has a verb followed by a noun phrase.

Let’s take the sentence “I walked the dog.” and tokenize and parse it according to these grammar rules.

“I” is the first token. We know that, based on the grammar, it has to be part of a noun phrase. Since it is not in the list of determiners (e.g., “the”), we know it’s a noun. This means our noun phrase is “I.”

The next token should, thus, begin a verb phrase. “walked” is a verb, so that checks out. As the next part of the verb phrase there should be another noun phrase. “The” is a determiner, so a valid start to a noun phrase and “dog” is a noun, ending the verb phrase.

We could have a tree that breaks down the sentence from the full thing into a noun and verb phrase and their sub-components all the way down to the individual tokens.

Doing this with Python or any other programming language is really not any different, just the grammar is.

I would strongly recommend looking at [nand2tetris](https://www.nand2tetris.org/course), a set of projects some CS profs put together to accompany a textbook they wrote. In it you build out a simple 16 bit computer from first principles. In project ten and eleven you build a compiler for a simple java-like programming language. Project 10 has the parser portion of it.

Anonymous Posted March 13, 2023 0 Comments

There are three steps to compiling or interpreting code.

The compiler/interpreter must tokenize the code. That is, split it up into discrete words or other groups of symbols that have meaning (e.g. if you have x+y you want the interpreter to recognize x, +, and y as distinct tokens).

The program must then parse the code, that is, look at the tokens as they’re presented and use the grammar of the language to interpret the actual instructions.

The program finally then generates code in the target language its compiling/interpreting to.

So, let’s talk about parsing. In English, a valid sentence has to have a noun phrase and a verb phrase. a noun phrase has an optional determiner (e.g. “the”) and a noun. A verb phrase has a verb followed by a noun phrase.

Let’s take the sentence “I walked the dog.” and tokenize and parse it according to these grammar rules.

“I” is the first token. We know that, based on the grammar, it has to be part of a noun phrase. Since it is not in the list of determiners (e.g., “the”), we know it’s a noun. This means our noun phrase is “I.”

The next token should, thus, begin a verb phrase. “walked” is a verb, so that checks out. As the next part of the verb phrase there should be another noun phrase. “The” is a determiner, so a valid start to a noun phrase and “dog” is a noun, ending the verb phrase.

We could have a tree that breaks down the sentence from the full thing into a noun and verb phrase and their sub-components all the way down to the individual tokens.

Doing this with Python or any other programming language is really not any different, just the grammar is.

I would strongly recommend looking at [nand2tetris](https://www.nand2tetris.org/course), a set of projects some CS profs put together to accompany a textbook they wrote. In it you build out a simple 16 bit computer from first principles. In project ten and eleven you build a compiler for a simple java-like programming language. Project 10 has the parser portion of it.

Anonymous Posted March 13, 2023 0 Comments

There are three steps to compiling or interpreting code.

The compiler/interpreter must tokenize the code. That is, split it up into discrete words or other groups of symbols that have meaning (e.g. if you have x+y you want the interpreter to recognize x, +, and y as distinct tokens).

The program must then parse the code, that is, look at the tokens as they’re presented and use the grammar of the language to interpret the actual instructions.

The program finally then generates code in the target language its compiling/interpreting to.

So, let’s talk about parsing. In English, a valid sentence has to have a noun phrase and a verb phrase. a noun phrase has an optional determiner (e.g. “the”) and a noun. A verb phrase has a verb followed by a noun phrase.

Let’s take the sentence “I walked the dog.” and tokenize and parse it according to these grammar rules.

“I” is the first token. We know that, based on the grammar, it has to be part of a noun phrase. Since it is not in the list of determiners (e.g., “the”), we know it’s a noun. This means our noun phrase is “I.”

The next token should, thus, begin a verb phrase. “walked” is a verb, so that checks out. As the next part of the verb phrase there should be another noun phrase. “The” is a determiner, so a valid start to a noun phrase and “dog” is a noun, ending the verb phrase.

We could have a tree that breaks down the sentence from the full thing into a noun and verb phrase and their sub-components all the way down to the individual tokens.

Doing this with Python or any other programming language is really not any different, just the grammar is.

I would strongly recommend looking at [nand2tetris](https://www.nand2tetris.org/course), a set of projects some CS profs put together to accompany a textbook they wrote. In it you build out a simple 16 bit computer from first principles. In project ten and eleven you build a compiler for a simple java-like programming language. Project 10 has the parser portion of it.

Anonymous Posted March 13, 2023 0 Comments

A syntax tree is a way of representing a written set of commands. We can create a basic syntax tree from mathematical formulas.

You can imagine a basic syntax tree with each ‘node’ consisting of 3 parts – a left operand, an operator, and a right operand.

So, a basic math problem might look like

3 + 4 = ?

+
/
3 4

This syntax tree only has one node, with 3 as its left operand, 4 as its right operand and + as the operator.

If we make it a little more complicated, we can create a bigger tree:

3 + 4 * 6 = ?

+
/
3 *
/
4 6

In this tree, you can see that the bottom node needs to be calculated before the top node can be calculated.

You can create similar syntax trees with different rules and different numbers of members for virtually any kind of code or language.

Anonymous Posted March 13, 2023 0 Comments

A syntax tree is a way of representing a written set of commands. We can create a basic syntax tree from mathematical formulas.

You can imagine a basic syntax tree with each ‘node’ consisting of 3 parts – a left operand, an operator, and a right operand.

So, a basic math problem might look like

3 + 4 = ?

+
/
3 4

This syntax tree only has one node, with 3 as its left operand, 4 as its right operand and + as the operator.

If we make it a little more complicated, we can create a bigger tree:

3 + 4 * 6 = ?

+
/
3 *
/
4 6

In this tree, you can see that the bottom node needs to be calculated before the top node can be calculated.

You can create similar syntax trees with different rules and different numbers of members for virtually any kind of code or language.

Anonymous Posted March 13, 2023 0 Comments

A syntax tree is a way of representing a written set of commands. We can create a basic syntax tree from mathematical formulas.

You can imagine a basic syntax tree with each ‘node’ consisting of 3 parts – a left operand, an operator, and a right operand.

So, a basic math problem might look like

3 + 4 = ?

+
/
3 4

This syntax tree only has one node, with 3 as its left operand, 4 as its right operand and + as the operator.

If we make it a little more complicated, we can create a bigger tree:

3 + 4 * 6 = ?

+
/
3 *
/
4 6

In this tree, you can see that the bottom node needs to be calculated before the top node can be calculated.

You can create similar syntax trees with different rules and different numbers of members for virtually any kind of code or language.