Skip to content

Shunting-yard algorithm

wikipedia Shunting-yard algorithm

NOTE:

调度场算法

In computer science, the shunting-yard algorithm is a method for parsing mathematical expressions specified in infix notation. It can produce either a postfix notation string, also known as Reverse Polish notation (RPN), or an abstract syntax tree (AST). The algorithm was invented by Edsger Dijkstra and named the "shunting yard"(调车场) algorithm because its operation resembles that of a railroad shunting yard. Dijkstra first described the Shunting Yard Algorithm in the Mathematisch Centrum report MR 34/61.

Like the evaluation of RPN( Reverse Polish notation ), the shunting yard algorithm is stack-based. Infix expressions are the form of mathematical notation most people are used to, for instance "3 + 4" or "3 + 4 × (2 − 1)". For the conversion there are two text variables (strings), the input and the output. There is also a stack that holds operators not yet added to the output queue. To convert, the program reads each symbol in order and does something based on that symbol. The result for the above examples would be (in Reverse Polish notation) "3 4 +" and "3 4 2 1 − × +", respectively.

The shunting-yard algorithm was later generalized(泛化) into operator-precedence parsing.

A simple conversion

  1. Input: 3 + 4

  2. Push 3 to the output queue (whenever a number is read it is pushed to the output)

  3. Push + (or its ID) onto the operator stack

  4. Push 4 to the output queue

  5. After reading the expression, pop the operators off the stack and add them to the output. In this case there is only one, "+".

  6. Output: 3 4 +

This already shows a couple of rules:

  • All numbers are pushed to the output when they are read.
  • At the end of reading the expression, pop all operators off the stack and onto the output.

NOTE:

无论哪种表达式,它们的operand的顺序是相同的,各种表达式的区别就在于它们的operator的位置不同,其实该算法所做的是决定何时将operator添加到output中,它所采用的方式是基于operator的precedence进行比较,operator stack有precedence的比较,同时也考虑了associative;由于它需要转换为postfix,所以operator看到是放到operand的后面的,当优先级更高的时候,就需要出栈,添加到output中;还需要考虑括号的情况,其实可以这样来看待括号,括号其实是一种隔离,将括号内的operator的stack和括号外的operator的stack隔离开来了;

Graphical illustration

Shunting yard.svg

Graphical illustration of algorithm, using a three-way railroad junction(三方铁路枢纽). The input is processed one symbol at a time: if a variable or number is found, it is copied directly to the output a), c), e), h). If the symbol is an operator, it is pushed onto the operator stack b), d), f). If the operator's precedence is less than that of the operators at the top of the stack or the precedences are equal and the operator is left associative, then that operator is popped off the stack and added to the output g). Finally, any remaining operators are popped off the stack and added to the output i).

NOTE:

如果是left associative(如除法,减法),则会

The algorithm in detail

Important terms: Token, Function, Operator associativity, Precedence

/* This implementation does not implement composite functions,functions with variable number of arguments, and unary operators. */

while there are tokens to be read do:
    read a token.
    if the token is a number, then:
        push it to the output queue.
    if the token is a function then:
        push it onto the operator stack 
    if the token is an operator, then:
        while ((there is a function at the top of the operator stack)
               or (there is an operator at the top of the operator stack with greater precedence)
               or (the operator at the top of the operator stack has equal precedence and is left associative))
              and (the operator at the top of the operator stack is not a left parenthesis):
            pop operators from the operator stack onto the output queue.
        push it onto the operator stack.
    if the token is a left paren (i.e. "("), then:
        push it onto the operator stack.
    if the token is a right paren (i.e. ")"), then:
        while the operator at the top of the operator stack is not a left paren:
            pop the operator from the operator stack onto the output queue.
        /* if the stack runs out without finding a left paren, then there are mismatched parentheses. */
        if there is a left paren at the top of the operator stack, then:
            pop the operator from the operator stack and discard it
after while loop, if operator stack not null, pop everything to output queue
if there are no more tokens to read then:
    while there are still operator tokens on the stack:
        /* if the operator token on the top of the stack is a paren, then there are mismatched parentheses. */
        pop the operator from the operator stack onto the output queue.
exit.

To analyze the running time complexity of this algorithm, one has only to note that each token will be read once, each number, function, or operator will be printed once, and each function, operator, or parenthesis will be pushed onto the stack and popped off the stack once—therefore, there are at most a constant number of operations executed per token, and the running time is thus O(n)—linear in the size of the input.

The shunting yard algorithm can also be applied to produce prefix notation (also known as Polish notation). To do this one would simply start from the end of a string of tokens to be parsed and work backwards, reverse the output queue (therefore making the output queue an output stack), and flip the left and right parenthesis behavior (remembering that the now-left parenthesis behavior should pop until it finds a now-right parenthesis). And changing the associativity condition to right.

TODO

stackoverflow Algorithm for converting expression to binary tree [closed]

codeproject Binary Tree Expression Solver

cnblogs shunting-yard 调度场算法、中缀表达式转逆波兰表达式

geeksforgeeks Program to convert Infix notation to Expression Tree