til/Knowledge/tdop.md

5.7 KiB

Top Down Operator Precedence

Read Top Down Operator Precedence.

Read Top down operator precedence parsing in Go.

Main flow:

Scanner -> Parser

Scanner

Input: string

Output: series of tokens

Token is type of text. Example input is 1 + 2: 1, 2 is token int, + is token plus.

Parser

From tokens we scanned, we parsed them to AST tree.

Expression

Each node of AST tree is called expression. We implement expression like this:

type Expression struct {
  Token    Token
  Value    interface{}
  Children []Expression
}

Each expression has Token. Value and Children is optional. Example: expression int 3 has Token int, Value 3 but doesn't have Children, expression and A B has Token and, Children A B but doesn't have Value.

With input A + B * C, we parse to expression like this:

  +
 / \
A   *
   / \
  B   C

Token precedence

Each token has precedence. Precedence decides order of operator. Example A + B * C, * has higher precedence than + so A + (B * C).

Token program

Each token has programs, program to decide what to do if we meet that token when we parse.

Token program can be 2 types: nud or led.

short long explain
nud null denotation code denoted by a value (int, string, ...) token or prefix token
led left denotation code denoted by an infix token

Example prefix token is (, not, - (negative sign). Example infix token is and, or, ==.

Pratt algorithm

To do what we want, we implement Pratt algorithm.

Core algorithm looks like this:

func Parse(precedence int) Expression {
  token := Scan()
  result := nud(token)

  for {
    peekToken := Peek()
    if precedence >= peekToken.Precedence() {
      break
    }

    token := Scan()
    result = led(token, result)
  }

  return result
}

func nud(token Token) Expression {
  return Expression{
    Token: token,
    // deal with value and children
  }
}

func led(token Token, expr Expression) Expression {
  rightExpr := Parse(token.Precedence())
  // do something special
  return Expression {
    Token: token,
    // deal with value
    Children: []Expression{
      expr,
      rightExpr,
    }
  }
}
mystery explain
precedence argument precedence of previous token
Scan() return next token and ahh it gone
Peek() return next token but it's still there
precedence >= peekToken.Precedence() previous token is already powerful than next token, stop
nud(token) return expression, this token must be value or prefix
led(token, result) return expression with result as right argument, this token must be infix

Must remember is nud() and led in example are for general. Each token should define how nud() and led() do, if not define let user handle error.

To parse, call Parse(0).

This algorithm is hard I know. But it will be easier if we read through example

Example

Assume +, - precedence is 1, * precedence is 2.

Input: A + B * C - D

Function calls happen as follows:

Parse(precedence = 0) (1)
  nud(A) result in Expression(A)
  0 < peek.Precedence (peek is +, precedence is 1), enter loop
    led(+, Expression(A)) result in Expression(+)
      save Expression(A) as first child
      call Parse(precedence = 1) (2) and save result as second child

Tree:
  +
 / \
A   ?

Parse(precedence = 1) (2)
  nud(B) result in Expression(B)
  1 < peek.Precedence (peek is *, precedence is 2), enter loop
    led(*, Expression(B)) result in Expression(*)
      save Expression(B) as first child
      call Parse(precedence = 2) (3) and save result as second child

Tree:
  *
 / \
B   ?

Parse(precedence = 2) (3)
  nud(C) result in Expression(C)
  2 > peek.Precedence(peek is -, precedence is 1), stop loop
  return Expression(C)

Tree:
C

Back to Parse(precedence = 1) (2)
  Expression(*) has Expression(C) as second child
  continue loop
    1 = peek.Precedence (peek is -, precedence is 1), stop loop
  return Expression(*) with Expression(B), Expression(C) as children

Tree:
  *
 / \
B   C

Back to Parse(precedence = 0) (1)
  Expression(+) has Expression(*) as second child
  continue loop
    0 < peek.Precedence (peek is +, precedence is 1)
      led(-, Expression(+)) result in Expression(-)
      save Expression(+) as first child
      call Parse(precedence = 1) (4) and save result as second child

Tree:
    -
   / \
  +   ?
 / *
A / \
 B   C

Parse(precedence = 1) (4)
  nud(D) result in Expression(D)
  1 < peek.Precedence(peek is EOF, precedence is 0), stop loop
  return Expression(D)

Tree:
D

Back to Parse(precedence = 0) (1)
  Expression(-) has Expression(D) as second child
  continue loop
    0 = peek.Precedence(peek is EOF, precedence is 0), stop loop
  return Expression(-) with Expression(+), Expression(D) as children

Tree:
    -
   / \
  +   D
 / *
A / \
 B   C