5.7 KiB
Top Down Operator Precedence
Read Top Down Operator Precedence.
Read Top down operator precedence parsing in Go.
Main flow:
Scanner -> Parser
Scanner
Input: string
Output: series of tokens
Token is type of text.
Example input is 1 + 2
: 1
, 2
is token int
, +
is token plus
.
Parser
From tokens we scanned, we parsed them to AST tree.
Expression
Each node of AST tree is called expression. We implement expression like this:
type Expression struct {
Token Token
Value interface{}
Children []Expression
}
Each expression has Token
. Value
and Children
is optional.
Example: expression int 3
has Token int
, Value 3
but doesn't have Children
,
expression and A B
has Token and
, Children A B
but doesn't have Value
.
With input A + B * C
, we parse to expression like this:
+
/ \
A *
/ \
B C
Token precedence
Each token has precedence.
Precedence decides order of operator.
Example A + B * C
, *
has higher precedence than +
so A + (B * C)
.
Token program
Each token has programs, program to decide what to do if we meet that token when we parse.
Token program can be 2 types: nud
or led
.
short | long | explain |
---|---|---|
nud |
null denotation | code denoted by a value (int, string, ...) token or prefix token |
led |
left denotation | code denoted by an infix token |
Example prefix token is (
, not
, -
(negative sign).
Example infix token is and
, or
, ==
.
Pratt algorithm
To do what we want, we implement Pratt algorithm.
Core algorithm looks like this:
func Parse(precedence int) Expression {
token := Scan()
result := nud(token)
for {
peekToken := Peek()
if precedence >= peekToken.Precedence() {
break
}
token := Scan()
result = led(token, result)
}
return result
}
func nud(token Token) Expression {
return Expression{
Token: token,
// deal with value and children
}
}
func led(token Token, expr Expression) Expression {
rightExpr := Parse(token.Precedence())
// do something special
return Expression {
Token: token,
// deal with value
Children: []Expression{
expr,
rightExpr,
}
}
}
mystery | explain |
---|---|
precedence argument |
precedence of previous token |
Scan() |
return next token and ahh it gone |
Peek() |
return next token but it's still there |
precedence >= peekToken.Precedence() |
previous token is already powerful than next token, stop |
nud(token) |
return expression, this token must be value or prefix |
led(token, result) |
return expression with result as right argument, this token must be infix |
Must remember is nud()
and led
in example are for general.
Each token should define how nud()
and led()
do, if not define let user handle error.
To parse, call Parse(0)
.
This algorithm is hard I know. But it will be easier if we read through example
Example
Assume +
, -
precedence is 1, *
precedence is 2.
Input: A + B * C - D
Function calls happen as follows:
Parse(precedence = 0) (1)
nud(A) result in Expression(A)
0 < peek.Precedence (peek is +, precedence is 1), enter loop
led(+, Expression(A)) result in Expression(+)
save Expression(A) as first child
call Parse(precedence = 1) (2) and save result as second child
Tree:
+
/ \
A ?
Parse(precedence = 1) (2)
nud(B) result in Expression(B)
1 < peek.Precedence (peek is *, precedence is 2), enter loop
led(*, Expression(B)) result in Expression(*)
save Expression(B) as first child
call Parse(precedence = 2) (3) and save result as second child
Tree:
*
/ \
B ?
Parse(precedence = 2) (3)
nud(C) result in Expression(C)
2 > peek.Precedence(peek is -, precedence is 1), stop loop
return Expression(C)
Tree:
C
Back to Parse(precedence = 1) (2)
Expression(*) has Expression(C) as second child
continue loop
1 = peek.Precedence (peek is -, precedence is 1), stop loop
return Expression(*) with Expression(B), Expression(C) as children
Tree:
*
/ \
B C
Back to Parse(precedence = 0) (1)
Expression(+) has Expression(*) as second child
continue loop
0 < peek.Precedence (peek is +, precedence is 1)
led(-, Expression(+)) result in Expression(-)
save Expression(+) as first child
call Parse(precedence = 1) (4) and save result as second child
Tree:
-
/ \
+ ?
/ *
A / \
B C
Parse(precedence = 1) (4)
nud(D) result in Expression(D)
1 < peek.Precedence(peek is EOF, precedence is 0), stop loop
return Expression(D)
Tree:
D
Back to Parse(precedence = 0) (1)
Expression(-) has Expression(D) as second child
continue loop
0 = peek.Precedence(peek is EOF, precedence is 0), stop loop
return Expression(-) with Expression(+), Expression(D) as children
Tree:
-
/ \
+ D
/ *
A / \
B C