tdop
parent
ce47a38f29
commit
0fbb2a7cba
|
@ -0,0 +1,223 @@
|
|||
# Top Down Operator Precedence
|
||||
|
||||
Read [Top Down Operator Precedence](https://tdop.github.io/).
|
||||
|
||||
Read [Top down operator precedence parsing in Go](http://www.cristiandima.com/top-down-operator-precedence-parsing-in-go/).
|
||||
|
||||
Main flow:
|
||||
|
||||
```txt
|
||||
Scanner -> Parser
|
||||
```
|
||||
|
||||
## Scanner
|
||||
|
||||
Input: string
|
||||
|
||||
Output: series of tokens
|
||||
|
||||
Token is type of text.
|
||||
Example input is `1 + 2`: `1`, `2` is token `int`, `+` is token `plus`.
|
||||
|
||||
## Parser
|
||||
|
||||
From tokens we scanned, we parsed them to [AST tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
|
||||
|
||||
### Expression
|
||||
|
||||
Each node of AST tree is called expression.
|
||||
We implement expression like this:
|
||||
|
||||
```go
|
||||
type Expression struct {
|
||||
Token Token
|
||||
Value interface{}
|
||||
Children []Expression
|
||||
}
|
||||
```
|
||||
|
||||
Each expression has `Token`. `Value` and `Children` is optional.
|
||||
Example: expression `int 3` has `Token int`, `Value 3` but doesn't have `Children`,
|
||||
expression `and A B` has `Token and`, `Children A B` but doesn't have `Value`.
|
||||
|
||||
With input `A + B * C`, we parse to expression like this:
|
||||
|
||||
```txt
|
||||
+
|
||||
/ \
|
||||
A *
|
||||
/ \
|
||||
B C
|
||||
```
|
||||
|
||||
### Token precedence
|
||||
|
||||
Each token has [precedence](https://en.wikipedia.org/wiki/Order_of_operations#Programming_languages).
|
||||
Precedence decides order of operator.
|
||||
Example `A + B * C`, `*` has higher precedence than `+` so `A + (B * C)`.
|
||||
|
||||
### Token program
|
||||
|
||||
Each token has programs, program to decide what to do if we meet that token when we parse.
|
||||
|
||||
Token program can be 2 types: `nud` or `led`.
|
||||
|
||||
| short | long | explain |
|
||||
| ----- | --------------- | ---------------------------------------------------------------- |
|
||||
| `nud` | null denotation | code denoted by a value (int, string, ...) token or prefix token |
|
||||
| `led` | left denotation | code denoted by an infix token |
|
||||
|
||||
Example prefix token is `(`, `not`, `-` (negative sign).
|
||||
Example infix token is `and`, `or`, `==`.
|
||||
|
||||
### Pratt algorithm
|
||||
|
||||
To do what we want, we implement Pratt algorithm.
|
||||
|
||||
Core algorithm looks like this:
|
||||
|
||||
```go
|
||||
func Parse(precedence int) Expression {
|
||||
token := Scan()
|
||||
result := nud(token)
|
||||
|
||||
for {
|
||||
peekToken := Peek()
|
||||
if precedence >= peekToken.Precedence() {
|
||||
break
|
||||
}
|
||||
|
||||
token := Scan()
|
||||
result = led(token, result)
|
||||
}
|
||||
|
||||
return result
|
||||
}
|
||||
|
||||
func nud(token Token) Expression {
|
||||
return Expression{
|
||||
Token: token,
|
||||
// deal with value and children
|
||||
}
|
||||
}
|
||||
|
||||
func led(token Token, expr Expression) Expression {
|
||||
rightExpr := Parse(token.Precedence())
|
||||
// do something special
|
||||
return Expression {
|
||||
Token: token,
|
||||
// deal with value
|
||||
Children: []Expression{
|
||||
expr,
|
||||
rightExpr,
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| mystery | explain |
|
||||
| -------------------------------------- | ------------------------------------------------------------------------- |
|
||||
| `precedence argument` | precedence of previous token |
|
||||
| `Scan()` | return next token and ahh it gone |
|
||||
| `Peek()` | return next token but it's still there |
|
||||
| `precedence >= peekToken.Precedence()` | previous token is already powerful than next token, stop |
|
||||
| `nud(token)` | return expression, this token must be value or prefix |
|
||||
| `led(token, result)` | return expression with result as right argument, this token must be infix |
|
||||
|
||||
Must remember is `nud()` and `led` in example are for general.
|
||||
Each token should define how `nud()` and `led()` do, if not define let user handle error.
|
||||
|
||||
To parse, call `Parse(0)`.
|
||||
|
||||
This algorithm is hard I know. But it will be easier if we read through example
|
||||
|
||||
### Example
|
||||
|
||||
Assume `+`, `-` precedence is 1, `*` precedence is 2.
|
||||
|
||||
Input: `A + B * C - D`
|
||||
|
||||
Function calls happen as follows:
|
||||
|
||||
```txt
|
||||
Parse(precedence = 0) (1)
|
||||
nud(A) result in Expression(A)
|
||||
0 < peek.Precedence (peek is +, precedence is 1), enter loop
|
||||
led(+, Expression(A)) result in Expression(+)
|
||||
save Expression(A) as first child
|
||||
call Parse(precedence = 1) (2) and save result as second child
|
||||
|
||||
Tree:
|
||||
+
|
||||
/ \
|
||||
A ?
|
||||
|
||||
Parse(precedence = 1) (2)
|
||||
nud(B) result in Expression(B)
|
||||
1 < peek.Precedence (peek is *, precedence is 2), enter loop
|
||||
led(*, Expression(B)) result in Expression(*)
|
||||
save Expression(B) as first child
|
||||
call Parse(precedence = 2) (3) and save result as second child
|
||||
|
||||
Tree:
|
||||
*
|
||||
/ \
|
||||
B ?
|
||||
|
||||
Parse(precedence = 2) (3)
|
||||
nud(C) result in Expression(C)
|
||||
2 > peek.Precedence(peek is -, precedence is 1), stop loop
|
||||
return Expression(C)
|
||||
|
||||
Tree:
|
||||
C
|
||||
|
||||
Back to Parse(precedence = 1) (2)
|
||||
Expression(*) has Expression(C) as second child
|
||||
continue loop
|
||||
1 = peek.Precedence (peek is -, precedence is 1), stop loop
|
||||
return Expression(*) with Expression(B), Expression(C) as children
|
||||
|
||||
Tree:
|
||||
*
|
||||
/ \
|
||||
B C
|
||||
|
||||
Back to Parse(precedence = 0) (1)
|
||||
Expression(+) has Expression(*) as second child
|
||||
continue loop
|
||||
0 < peek.Precedence (peek is +, precedence is 1)
|
||||
led(-, Expression(+)) result in Expression(-)
|
||||
save Expression(+) as first child
|
||||
call Parse(precedence = 1) (4) and save result as second child
|
||||
|
||||
Tree:
|
||||
-
|
||||
/ \
|
||||
+ ?
|
||||
/ *
|
||||
A / \
|
||||
B C
|
||||
|
||||
Parse(precedence = 1) (4)
|
||||
nud(D) result in Expression(D)
|
||||
1 < peek.Precedence(peek is EOF, precedence is 0), stop loop
|
||||
return Expression(D)
|
||||
|
||||
Tree:
|
||||
D
|
||||
|
||||
Back to Parse(precedence = 0) (1)
|
||||
Expression(-) has Expression(D) as second child
|
||||
continue loop
|
||||
0 = peek.Precedence(peek is EOF, precedence is 0), stop loop
|
||||
return Expression(-) with Expression(+), Expression(D) as children
|
||||
|
||||
Tree:
|
||||
-
|
||||
/ \
|
||||
+ D
|
||||
/ *
|
||||
A / \
|
||||
B C
|
||||
```
|
Loading…
Reference in New Issue