LR Parsing
A highly expressive bottom-up parsing algorithm reading input from Left to right, computing a Rightmost derivation in reverse.
Unlike LL predictive parsing, LR parsers evaluate input tokens by shifting them onto a stack and reducing them retroactively once a full grammar rule mapping is unambiguously collected. While generating their parsing tables requires heavy computation (calculating mathematical closure items mapped onto a Finite State Machine Automatons), bottom-up architecture grants several massive advantages over LL(1):
- LR parsers natively handle left-recursive grammars (
E → E + T) gracefully without crashing. - LR supports a much larger, encompassing class of context-free languages compared to LL.
- Fewer language rules require painful mathematical "Left Factoring" just to appease the parser limitations.
Our engine natively computes four escalating tiers of bottom-up parsers. They utilize the same underlying stack-push shifting execution workflow but differ wildly in how they generate their state maps (item closures) and when they mathematically authorize a "Reduce" command.
LR(0)
The purest baseline. LR(0) completely ignores lookaheads, executing rule reductions blindly if it reaches the end of an item path. Exceedingly weak and highly vulnerable to Shift/Reduce conflicts.
SLR(1)
Simple LR. Computes the LR(0) state machine but vastly improves reduction conditions by restricting a reduction of A → α to *only* occur if the upcoming input token mathematically resides in FOLLOW(A).
CLR(1)
Canonical LR. Calculates item closure mappings using a hyper-specific, exhaustive 1-lookahead token context [A → α • β, a]. Massively resilient to grammar ambiguity but creates an explosive, exponentially massive State Machine graph.
LALR(1)
Look-Ahead LR. The gold standard deployed securely inside actual production compilers like YACC and Bison. Mathematically compresses identical states inside the CLR Automaton graph to save tremendous memory overhead while preserving exact lookaheads.
Let's watch an LR parser evaluate the string id + id using the math grammar. The parser has two main data structures: a Stack (holding states and symbols) and an Input Buffer.
| Step | Stack | Input | Action |
|---|---|---|---|
| 1 | [0] | id + id $ | Shift state 5 (push id) |
| 2 | [0, id, 5] | + id $ | Reduce F → id |
| 3 | [0, F, 3] | + id $ | Reduce T → F |
| 4 | [0, T, 2] | + id $ | Reduce E → T |
| 5 | [0, E, 1] | + id $ | Shift state 6 (push +) |
| 6 | [..., E, 1, +, 6] | id $ | Shift state 5 (push id) |
| 7-8 | [..., +, 6, T, 9] | $ | Reduce id → F → T |
| 9 | [0, E, 1, +, 6, T, 9] | $ | Reduce E → E + T |
| 10 | [0, E, 1] | $ | ACCEPT |