tdop

2020-03-26 04:32:58 +07:00 · 2020-03-26 04:32:58 +07:00 · 0fbb2a7cba
parent ce47a38f29
commit 0fbb2a7cba
1 changed files with 223 additions and 0 deletions
--- a/tdop.md
+++ b/tdop.md
@ -0,0 +1,223 @@
+# Top Down Operator Precedence
+
+Read [Top Down Operator Precedence](https://tdop.github.io/).
+
+Read [Top down operator precedence parsing in Go](http://www.cristiandima.com/top-down-operator-precedence-parsing-in-go/).
+
+Main flow:
+
+```txt
+Scanner -> Parser
+```
+
+## Scanner
+
+Input: string
+
+Output: series of tokens
+
+Token is type of text.
+Example input is `1 + 2`: `1`, `2` is token `int`, `+` is token `plus`.
+
+## Parser
+
+From tokens we scanned, we parsed them to [AST tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
+
+### Expression
+
+Each node of AST tree is called expression.
+We implement expression like this:
+
+```go
+type Expression struct {
+  Token    Token
+  Value    interface{}
+  Children []Expression
+}
+```
+
+Each expression has `Token`. `Value` and `Children` is optional.
+Example: expression `int 3` has `Token int`, `Value 3` but doesn't have `Children`,
+expression `and A B` has `Token and`, `Children A B` but doesn't have `Value`.
+
+With input `A + B * C`, we parse to expression like this:
+
+```txt
+  +
+ / \
+A   *
+   / \
+  B   C
+```
+
+### Token precedence
+
+Each token has [precedence](https://en.wikipedia.org/wiki/Order_of_operations#Programming_languages).
+Precedence decides order of operator.
+Example `A + B * C`, `*` has higher precedence than `+` so `A + (B * C)`.
+
+### Token program
+
+Each token has programs, program to decide what to do if we meet that token when we parse.
+
+Token program can be 2 types: `nud` or `led`.
+
+| short | long            | explain                                                          |
+| ----- | --------------- | ---------------------------------------------------------------- |
+| `nud` | null denotation | code denoted by a value (int, string, ...) token or prefix token |
+| `led` | left denotation | code denoted by an infix token                                   |
+
+Example prefix token is `(`, `not`, `-` (negative sign).
+Example infix token is `and`, `or`, `==`.
+
+### Pratt algorithm
+
+To do what we want, we implement Pratt algorithm.
+
+Core algorithm looks like this:
+
+```go
+func Parse(precedence int) Expression {
+  token := Scan()
+  result := nud(token)
+
+  for {
+    peekToken := Peek()
+    if precedence >= peekToken.Precedence() {
+      break
+    }
+
+    token := Scan()
+    result = led(token, result)
+  }
+
+  return result
+}
+
+func nud(token Token) Expression {
+  return Expression{
+    Token: token,
+    // deal with value and children
+  }
+}
+
+func led(token Token, expr Expression) Expression {
+  rightExpr := Parse(token.Precedence())
+  // do something special
+  return Expression {
+    Token: token,
+    // deal with value
+    Children: []Expression{
+      expr,
+      rightExpr,
+    }
+  }
+}
+```
+
+| mystery                                | explain                                                                   |
+| -------------------------------------- | ------------------------------------------------------------------------- |
+| `precedence argument`                  | precedence of previous token                                              |
+| `Scan()`                               | return next token and ahh it gone                                         |
+| `Peek()`                               | return next token but it's still there                                    |
+| `precedence >= peekToken.Precedence()` | previous token is already powerful than next token, stop                  |
+| `nud(token)`                           | return expression, this token must be value or prefix                     |
+| `led(token, result)`                   | return expression with result as right argument, this token must be infix |
+
+Must remember is `nud()` and `led` in example are for general.
+Each token should define how `nud()` and `led()` do, if not define let user handle error.
+
+To parse, call `Parse(0)`.
+
+This algorithm is hard I know. But it will be easier if we read through example
+
+### Example
+
+Assume `+`, `-` precedence is 1, `*` precedence is 2.
+
+Input: `A + B * C - D`
+
+Function calls happen as follows:
+
+```txt
+Parse(precedence = 0) (1)
+  nud(A) result in Expression(A)
+  0 < peek.Precedence (peek is +, precedence is 1), enter loop
+    led(+, Expression(A)) result in Expression(+)
+      save Expression(A) as first child
+      call Parse(precedence = 1) (2) and save result as second child
+
+Tree:
+  +
+ / \
+A   ?
+
+Parse(precedence = 1) (2)
+  nud(B) result in Expression(B)
+  1 < peek.Precedence (peek is *, precedence is 2), enter loop
+    led(*, Expression(B)) result in Expression(*)
+      save Expression(B) as first child
+      call Parse(precedence = 2) (3) and save result as second child
+
+Tree:
+  *
+ / \
+B   ?
+
+Parse(precedence = 2) (3)
+  nud(C) result in Expression(C)
+  2 > peek.Precedence(peek is -, precedence is 1), stop loop
+  return Expression(C)
+
+Tree:
+C
+
+Back to Parse(precedence = 1) (2)
+  Expression(*) has Expression(C) as second child
+  continue loop
+    1 = peek.Precedence (peek is -, precedence is 1), stop loop
+  return Expression(*) with Expression(B), Expression(C) as children
+
+Tree:
+  *
+ / \
+B   C
+
+Back to Parse(precedence = 0) (1)
+  Expression(+) has Expression(*) as second child
+  continue loop
+    0 < peek.Precedence (peek is +, precedence is 1)
+      led(-, Expression(+)) result in Expression(-)
+      save Expression(+) as first child
+      call Parse(precedence = 1) (4) and save result as second child
+
+Tree:
+    -
+   / \
+  +   ?
+ / *
+A / \
+ B   C
+
+Parse(precedence = 1) (4)
+  nud(D) result in Expression(D)
+  1 < peek.Precedence(peek is EOF, precedence is 0), stop loop
+  return Expression(D)
+
+Tree:
+D
+
+Back to Parse(precedence = 0) (1)
+  Expression(-) has Expression(D) as second child
+  continue loop
+    0 = peek.Precedence(peek is EOF, precedence is 0), stop loop
+  return Expression(-) with Expression(+), Expression(D) as children
+
+Tree:
+    -
+   / \
+  +   D
+ / *
+A / \
+ B   C
+```