til/Knowledge/tdop.md

# Top Down Operator Precedence

Read [Top Down Operator Precedence](https://tdop.github.io/).

Read [Top down operator precedence parsing in Go](http://www.cristiandima.com/top-down-operator-precedence-parsing-in-go/).

Main flow:

```txt
Scanner -> Parser
```

## Scanner

Input: string

Output: series of tokens

Token is type of text.
Example input is `1 + 2`: `1`, `2` is token `int`, `+` is token `plus`.

## Parser

From tokens we scanned, we parsed them to [AST tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).

### Expression

Each node of AST tree is called expression.
We implement expression like this:

```go
type Expression struct {
  Token    Token
  Value    interface{}
  Children []Expression
}
```

Each expression has `Token`. `Value` and `Children` is optional.
Example: expression `int 3` has `Token int`, `Value 3` but doesn't have `Children`,
expression `and A B` has `Token and`, `Children A B` but doesn't have `Value`.

With input `A + B * C`, we parse to expression like this:

```txt
  +
 / \
A   *
   / \
  B   C
```

### Token precedence

Each token has [precedence](https://en.wikipedia.org/wiki/Order_of_operations#Programming_languages).
Precedence decides order of operator.
Example `A + B * C`, `*` has higher precedence than `+` so `A + (B * C)`.

### Token program

Each token has programs, program to decide what to do if we meet that token when we parse.

Token program can be 2 types: `nud` or `led`.

| short | long            | explain                                                          |
| ----- | --------------- | ---------------------------------------------------------------- |
| `nud` | null denotation | code denoted by a value (int, string, ...) token or prefix token |
| `led` | left denotation | code denoted by an infix token                                   |

Example prefix token is `(`, `not`, `-` (negative sign).
Example infix token is `and`, `or`, `==`.

### Pratt algorithm

To do what we want, we implement Pratt algorithm.

Core algorithm looks like this:

```go
func Parse(precedence int) Expression {
  token := Scan()
  result := nud(token)

  for {
    peekToken := Peek()
    if precedence >= peekToken.Precedence() {
      break
    }

    token := Scan()
    result = led(token, result)
  }

  return result
}

func nud(token Token) Expression {
  return Expression{
    Token: token,
    // deal with value and children
  }
}

func led(token Token, expr Expression) Expression {
  rightExpr := Parse(token.Precedence())
  // do something special
  return Expression {
    Token: token,
    // deal with value
    Children: []Expression{
      expr,
      rightExpr,
    }
  }
}
```

| mystery                                | explain                                                                   |
| -------------------------------------- | ------------------------------------------------------------------------- |
| `precedence argument`                  | precedence of previous token                                              |
| `Scan()`                               | return next token and ahh it gone                                         |
| `Peek()`                               | return next token but it's still there                                    |
| `precedence >= peekToken.Precedence()` | previous token is already powerful than next token, stop                  |
| `nud(token)`                           | return expression, this token must be value or prefix                     |
| `led(token, result)`                   | return expression with result as right argument, this token must be infix |

Must remember is `nud()` and `led` in example are for general.
Each token should define how `nud()` and `led()` do, if not define let user handle error.

To parse, call `Parse(0)`.

This algorithm is hard I know. But it will be easier if we read through example

### Example

Assume `+`, `-` precedence is 1, `*` precedence is 2.

Input: `A + B * C - D`

Function calls happen as follows:

```txt
Parse(precedence = 0) (1)
  nud(A) result in Expression(A)
  0 < peek.Precedence (peek is +, precedence is 1), enter loop
    led(+, Expression(A)) result in Expression(+)
      save Expression(A) as first child
      call Parse(precedence = 1) (2) and save result as second child

Tree:
  +
 / \
A   ?

Parse(precedence = 1) (2)
  nud(B) result in Expression(B)
  1 < peek.Precedence (peek is *, precedence is 2), enter loop
    led(*, Expression(B)) result in Expression(*)
      save Expression(B) as first child
      call Parse(precedence = 2) (3) and save result as second child

Tree:
  *
 / \
B   ?

Parse(precedence = 2) (3)
  nud(C) result in Expression(C)
  2 > peek.Precedence(peek is -, precedence is 1), stop loop
  return Expression(C)

Tree:
C

Back to Parse(precedence = 1) (2)
  Expression(*) has Expression(C) as second child
  continue loop
    1 = peek.Precedence (peek is -, precedence is 1), stop loop
  return Expression(*) with Expression(B), Expression(C) as children

Tree:
  *
 / \
B   C

Back to Parse(precedence = 0) (1)
  Expression(+) has Expression(*) as second child
  continue loop
    0 < peek.Precedence (peek is +, precedence is 1)
      led(-, Expression(+)) result in Expression(-)
      save Expression(+) as first child
      call Parse(precedence = 1) (4) and save result as second child

Tree:
    -
   / \
  +   ?
 / *
A / \
 B   C

Parse(precedence = 1) (4)
  nud(D) result in Expression(D)
  1 < peek.Precedence(peek is EOF, precedence is 0), stop loop
  return Expression(D)

Tree:
D

Back to Parse(precedence = 0) (1)
  Expression(-) has Expression(D) as second child
  continue loop
    0 = peek.Precedence(peek is EOF, precedence is 0), stop loop
  return Expression(-) with Expression(+), Expression(D) as children

Tree:
    -
   / \
  +   D
 / *
A / \
 B   C
```