Skip to content

LR parser

wikipedia LR parser

In computer science, LR parsers are a type of bottom-up parser that analyses deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, Canonical LR(1) parsers, Minimal LR(1) parsers, GLR parsers. LR parsers can be generated by a parser generator from a formal grammar defining the syntax of the language to be parsed. They are widely used for the processing of computer languages.

An LR parser (L**eft-to-right, **R**ightmost derivation in reverse) reads input text from left to right without backing up (this is true for most parsers), and produces a rightmost derivation in reverse: it does a bottom-up parse - not a top-down LL parse or ad-hoc parse. The name LR is often followed by a numeric qualifier, as in **LR(1) or sometimes LR(k). To avoid backtracking or guessing, the LR parser is allowed to peek ahead at k lookahead input symbols before deciding how to parse earlier symbols. Typically k is 1 and is not mentioned. The name LR is often preceded by other qualifiers, as in SLR and LALR. The LR(k) condition for a grammar was suggested by Knuth to stand for "translatable from left to right with bound k."

LR parsers are deterministic; they produce a single correct parse without guesswork or backtracking, in linear time. This is ideal for computer languages, but LR parsers are not suited for human languages which need more flexible but inevitably slower methods. Some methods which can parse arbitrary context-free languages (e.g., Cocke–Younger–Kasami, Earley, GLR) have worst-case performance of O(*n*3) time. Other methods which backtrack or yield multiple parses may even take exponential time when they guess badly.

The above properties of L, R, and k are actually shared by all shift-reduce parsers, including precedence parsers. But by convention, the LR name stands for the form of parsing invented by Donald Knuth, and excludes the earlier, less powerful precedence methods (for example Operator-precedence parser).[1] LR parsers can handle a larger range of languages and grammars than precedence parsers or top-down LL parsing.[3] This is because the LR parser waits until it has seen an entire instance of some grammar pattern before committing to what it has found. An LL parser has to decide or guess what it is seeing much sooner, when it has only seen the leftmost input symbol of that pattern.

Overview

Bottom-up parse tree for example A*2 + 1

An LR parser scans and parses the input text in one forward pass over the text. The parser builds up the parse tree incrementally, bottom up, and left to right, without guessing or backtracking. At every point in this pass, the parser has accumulated a list of subtrees or phrases of the input text that have been already parsed. Those subtrees are not yet joined together because the parser has not yet reached the right end of the syntax pattern that will combine them.

At step 6 in an example parse, only "A*2" has been parsed, incompletely. Only the shaded lower-left corner of the parse tree exists. None of the parse tree nodes numbered 7 and above exist yet. Nodes 3, 4, and 6 are the roots of isolated subtrees for variable A, operator , and number 2, respectively. These three root nodes are temporarily held in a **parse stack*. The remaining unparsed portion of the input stream is "+ 1".

img

Bottom-up parse tree built in numbered steps

Shift and reduce actions

As with other shift-reduce parsers, an LR parser works by doing some combination of Shift steps and Reduce steps.

1、A Shift step advances in the input stream by one symbol. That shifted symbol becomes a new single-node parse tree.

2、A Reduce step applies a completed grammar rule to some of the recent parse trees, joining them together as one tree with a new root symbol.

If the input has no syntax errors, the parser continues with these steps until all of the input has been consumed and all of the parse trees have been reduced to a single tree representing an entire legal input.

LR parsers differ from other shift-reduce parsers in how they decide when to reduce, and how to pick between rules with similar endings. But the final decisions and the sequence of shift or reduce steps are the same.

Much of the LR parser's efficiency is from being deterministic. To avoid guessing, the LR parser often looks ahead (rightwards) at the next scanned symbol, before deciding what to do with previously scanned symbols. The lexical scanner works one or more symbols ahead of the parser. The lookahead symbols are the 'right-hand context' for the parsing decision.