224 lines
5.7 KiB
Markdown
224 lines
5.7 KiB
Markdown
# Top Down Operator Precedence
|
|
|
|
Read [Top Down Operator Precedence](https://tdop.github.io/).
|
|
|
|
Read [Top down operator precedence parsing in Go](http://www.cristiandima.com/top-down-operator-precedence-parsing-in-go/).
|
|
|
|
Main flow:
|
|
|
|
```txt
|
|
Scanner -> Parser
|
|
```
|
|
|
|
## Scanner
|
|
|
|
Input: string
|
|
|
|
Output: series of tokens
|
|
|
|
Token is type of text.
|
|
Example input is `1 + 2`: `1`, `2` is token `int`, `+` is token `plus`.
|
|
|
|
## Parser
|
|
|
|
From tokens we scanned, we parsed them to [AST tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
|
|
|
|
### Expression
|
|
|
|
Each node of AST tree is called expression.
|
|
We implement expression like this:
|
|
|
|
```go
|
|
type Expression struct {
|
|
Token Token
|
|
Value interface{}
|
|
Children []Expression
|
|
}
|
|
```
|
|
|
|
Each expression has `Token`. `Value` and `Children` is optional.
|
|
Example: expression `int 3` has `Token int`, `Value 3` but doesn't have `Children`,
|
|
expression `and A B` has `Token and`, `Children A B` but doesn't have `Value`.
|
|
|
|
With input `A + B * C`, we parse to expression like this:
|
|
|
|
```txt
|
|
+
|
|
/ \
|
|
A *
|
|
/ \
|
|
B C
|
|
```
|
|
|
|
### Token precedence
|
|
|
|
Each token has [precedence](https://en.wikipedia.org/wiki/Order_of_operations#Programming_languages).
|
|
Precedence decides order of operator.
|
|
Example `A + B * C`, `*` has higher precedence than `+` so `A + (B * C)`.
|
|
|
|
### Token program
|
|
|
|
Each token has programs, program to decide what to do if we meet that token when we parse.
|
|
|
|
Token program can be 2 types: `nud` or `led`.
|
|
|
|
| short | long | explain |
|
|
| ----- | --------------- | ---------------------------------------------------------------- |
|
|
| `nud` | null denotation | code denoted by a value (int, string, ...) token or prefix token |
|
|
| `led` | left denotation | code denoted by an infix token |
|
|
|
|
Example prefix token is `(`, `not`, `-` (negative sign).
|
|
Example infix token is `and`, `or`, `==`.
|
|
|
|
### Pratt algorithm
|
|
|
|
To do what we want, we implement Pratt algorithm.
|
|
|
|
Core algorithm looks like this:
|
|
|
|
```go
|
|
func Parse(precedence int) Expression {
|
|
token := Scan()
|
|
result := nud(token)
|
|
|
|
for {
|
|
peekToken := Peek()
|
|
if precedence >= peekToken.Precedence() {
|
|
break
|
|
}
|
|
|
|
token := Scan()
|
|
result = led(token, result)
|
|
}
|
|
|
|
return result
|
|
}
|
|
|
|
func nud(token Token) Expression {
|
|
return Expression{
|
|
Token: token,
|
|
// deal with value and children
|
|
}
|
|
}
|
|
|
|
func led(token Token, expr Expression) Expression {
|
|
rightExpr := Parse(token.Precedence())
|
|
// do something special
|
|
return Expression {
|
|
Token: token,
|
|
// deal with value
|
|
Children: []Expression{
|
|
expr,
|
|
rightExpr,
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
| mystery | explain |
|
|
| -------------------------------------- | ------------------------------------------------------------------------- |
|
|
| `precedence argument` | precedence of previous token |
|
|
| `Scan()` | return next token and ahh it gone |
|
|
| `Peek()` | return next token but it's still there |
|
|
| `precedence >= peekToken.Precedence()` | previous token is already powerful than next token, stop |
|
|
| `nud(token)` | return expression, this token must be value or prefix |
|
|
| `led(token, result)` | return expression with result as right argument, this token must be infix |
|
|
|
|
Must remember is `nud()` and `led` in example are for general.
|
|
Each token should define how `nud()` and `led()` do, if not define let user handle error.
|
|
|
|
To parse, call `Parse(0)`.
|
|
|
|
This algorithm is hard I know. But it will be easier if we read through example
|
|
|
|
### Example
|
|
|
|
Assume `+`, `-` precedence is 1, `*` precedence is 2.
|
|
|
|
Input: `A + B * C - D`
|
|
|
|
Function calls happen as follows:
|
|
|
|
```txt
|
|
Parse(precedence = 0) (1)
|
|
nud(A) result in Expression(A)
|
|
0 < peek.Precedence (peek is +, precedence is 1), enter loop
|
|
led(+, Expression(A)) result in Expression(+)
|
|
save Expression(A) as first child
|
|
call Parse(precedence = 1) (2) and save result as second child
|
|
|
|
Tree:
|
|
+
|
|
/ \
|
|
A ?
|
|
|
|
Parse(precedence = 1) (2)
|
|
nud(B) result in Expression(B)
|
|
1 < peek.Precedence (peek is *, precedence is 2), enter loop
|
|
led(*, Expression(B)) result in Expression(*)
|
|
save Expression(B) as first child
|
|
call Parse(precedence = 2) (3) and save result as second child
|
|
|
|
Tree:
|
|
*
|
|
/ \
|
|
B ?
|
|
|
|
Parse(precedence = 2) (3)
|
|
nud(C) result in Expression(C)
|
|
2 > peek.Precedence(peek is -, precedence is 1), stop loop
|
|
return Expression(C)
|
|
|
|
Tree:
|
|
C
|
|
|
|
Back to Parse(precedence = 1) (2)
|
|
Expression(*) has Expression(C) as second child
|
|
continue loop
|
|
1 = peek.Precedence (peek is -, precedence is 1), stop loop
|
|
return Expression(*) with Expression(B), Expression(C) as children
|
|
|
|
Tree:
|
|
*
|
|
/ \
|
|
B C
|
|
|
|
Back to Parse(precedence = 0) (1)
|
|
Expression(+) has Expression(*) as second child
|
|
continue loop
|
|
0 < peek.Precedence (peek is +, precedence is 1)
|
|
led(-, Expression(+)) result in Expression(-)
|
|
save Expression(+) as first child
|
|
call Parse(precedence = 1) (4) and save result as second child
|
|
|
|
Tree:
|
|
-
|
|
/ \
|
|
+ ?
|
|
/ *
|
|
A / \
|
|
B C
|
|
|
|
Parse(precedence = 1) (4)
|
|
nud(D) result in Expression(D)
|
|
1 < peek.Precedence(peek is EOF, precedence is 0), stop loop
|
|
return Expression(D)
|
|
|
|
Tree:
|
|
D
|
|
|
|
Back to Parse(precedence = 0) (1)
|
|
Expression(-) has Expression(D) as second child
|
|
continue loop
|
|
0 = peek.Precedence(peek is EOF, precedence is 0), stop loop
|
|
return Expression(-) with Expression(+), Expression(D) as children
|
|
|
|
Tree:
|
|
-
|
|
/ \
|
|
+ D
|
|
/ *
|
|
A / \
|
|
B C
|
|
```
|