LL VS LR VS Recursive descend parser
Recursive descent parser VS LL、LR
Recursive descent parser use backtracking to match.
LL parsers are table-based parsers, they use parsing table. Parsing table is constructed from the grammar and acts as a transformation function. Using parsing table and k tokens of lookahead, a LL parser can become predictive parser and avoid backtracking.
stackoverflow What is the difference between LL and LR parsing?
A
At a high level, the difference between LL parsing and LR parsing is that LL parsers begin at the start symbol and try to apply productions to arrive at the target string, whereas LR parsers begin at the target string and try to arrive back at the start symbol.
An LL parse is a left-to-right, leftmost derivation. That is, we consider the input symbols from the left to the right and attempt to construct a leftmost derivation. This is done by beginning at the start symbol and repeatedly expanding out the leftmost nonterminal until we arrive at the target string.
An LR parse is a left-to-right, rightmost derivation, meaning that we scan from the left to right and attempt to construct a rightmost derivation. The parser continuously picks a substring of the input and attempts to reverse it back to a nonterminal.
During an LL parse, the parser continuously chooses between two actions:
1、Predict: Based on the leftmost nonterminal and some number of lookahead tokens, choose which production ought to be applied to get closer to the input string.
2、Match: Match the leftmost guessed terminal symbol with the leftmost unconsumed symbol of input.
As an example, given this grammar:
- S → E
- E → T + E
- E → T
- T →
int
Then given the string int + int + int
, an LL(2) parser (which uses two tokens of lookahead) would parse the string as follows:
Production Input Action
---------------------------------------------------------
S int + int + int Predict S -> E
E int + int + int Predict E -> T + E
T + E int + int + int Predict T -> int
int + E int + int + int Match int
+ E + int + int Match +
E int + int Predict E -> T + E
T + E int + int Predict T -> int
int + E int + int Match int
+ E + int Match +
E int Predict E -> T
T int Predict T -> int
int int Match int
Accept
Notice that in each step we look at the leftmost symbol in our production. If it's a terminal, we match it, and if it's a nonterminal, we predict what it's going to be by choosing one of the rules.
In an LR parser, there are two actions:
1、Shift: Add the next token of input to a buffer for consideration.
2、Reduce: Reduce a collection of terminals and nonterminals in this buffer back to some nonterminal by reversing a production.
As an example, an LR(1) parser (with one token of lookahead) might parse that same string as follows:
Workspace Input Action
---------------------------------------------------------
int + int + int Shift
int + int + int Reduce T -> int
T + int + int Shift
T + int + int Shift
T + int + int Reduce T -> int
T + T + int Shift
T + T + int Shift
T + T + int Reduce T -> int
T + T + T Reduce E -> T
T + T + E Reduce E -> T + E
T + E Reduce E -> T + E
E Reduce S -> E
S Accept
The two parsing algorithms you mentioned (LL and LR) are known to have different characteristics. LL parsers tend to be easier to write by hand, but they are less powerful than LR parsers and accept a much smaller set of grammars than LR parsers do. LR parsers come in many flavors (LR(0), SLR(1), LALR(1), LR(1), IELR(1), GLR(0), etc.) and are far more powerful. They also tend to have much more complex and are almost always generated by tools like yacc
or bison
. LL parsers also come in many flavors (including LL(*), which is used by the ANTLR
tool), though in practice LL(1) is the most-widely used.
As a shameless plug, if you'd like to learn more about LL and LR parsing, I just finished teaching a compilers course and have some handouts and lecture slides on parsing on the course website. I'd be glad to elaborate on any of them if you think it would be useful.
A
Josh Haberman in his article LL and LR Parsing Demystified claims that LL parsing directly corresponds with the Polish Notation, whereas LR corresponds to Reverse Polish Notation. The difference between PN and RPN is the order of traversing the binary tree of the equation:
+ 1 * 2 3 // Polish (prefix) expression; pre-order traversal.
1 2 3 * + // Reverse Polish (postfix) expression; post-order traversal.
According to Haberman, this illustrates the main difference between LL and LR parsers:
The primary difference between how LL and LR parsers operate is that an LL parser outputs a pre-order traversal of the parse tree and an LR parser outputs a post-order traversal.
For the in-depth explanation, examples and conclusions check out Haberman's article.
预测分析表 VS LR语法分析表
LL和LR都是表驱动的,这两个表就是分别驱动两者的表。
语法分析的过程是不断根据产生式进行转换的过程。
两者的构造都是基于对grammar的分析,两者的构造过程都是沿着产生式进行derive,所不同的是,预测分析表一直derive到了terminal;而LR语法分析表则是将所有的可能的转换全部包含了:
预测分析表的构造使用是non-terminal symbol的FIRST和FOLLOW,FIRST和FOLLOW所包含的都是terminal,其实它的目的是知道当遇到某个terminal的时候,使用哪个production可以得到这个terminal。即它侧重的是对于一个non-terminal的production,它能够推导出什么,这样它就能够据此来决定是否使用这个production。
LR语法分析表正如其名称所揭示地,它其实是对语法的分析,对语法中所有的可能的转换进行分析,构造出来它的转换所对应的automaton。显然,因为它的转换都是基于grammar所构造出来的,因此,它的所有的转换都是有效的转换,因此只要待分析的串不符合这个automaton的转换,那么它就是无效的了。因此我们的待分析的串一定是对应了automaton中的一条路径。使用我们的文法,按照某种推导方式是一定能够derive这个串的。在待分析的串中,仅仅包含的是terminal,而grammar中,是两者都包含的,所以在进行分析的时候,一定要将terminal规约为non-terminal,这样才能够使分析继续下去。LR语法分析是推导的逆过程,显然我们是要沿着推导方向逆流而上,因为推导的过程的中间过程肯定是会存在non-terminal的,而我们是逆向进行推导,所以肯定需要将一些terminal规约为non-terminal。
为什么LR是right-most derivation?
预测分析表其实也是一个转换函数,要使用哪个产生式进行derivate
LR是一个automaton,状态,转换
语法分析表由两个部分组成:
- action
- goto
从LR(0)自动机到LR语法分析表
LR语法分析表的构造是基于LR(0)自动机的
本质还是转换,无论是在terminal上的转换还是在non-terminal上的转换
action主要定义的是在terminal上的转换,而goto则定义的是在non-terminal上的转换。
SLR语法分析表的构造过程主要让我感到困惑的是它将action定义成了 状态和terminal的函数?
SLR语法分析表的ACTION的构造过程中有这样的规则: