Shift-Reduce Parsing
A class of bottom-up parsing algorithms that constructs the parse tree by "shifting" input tokens onto a stack and "reducing" them based on production rules.
Shift-Reduce parsing is a form of bottom-up parsing. Instead of starting with the initial symbol and deriving the target string (like LL top-down parsing), a shift-reduce parser begins with the input string and attempts to progressively condense it backward to the Start Symbol.
The parser operates using a Stack to hold grammar symbols in progress and an Input Buffer for the unread tokens.
At any point during the parsing sequence, the parser decides between four possible operations:
- Shift
Moves the next unread token from the input buffer to the top of the stack. This is effectively saying, "I haven't found a complete production match yet, let me see more tokens."
- Reduce
When the sequence of symbols securely situated at the top of the stack matches the Right Hand Side (RHS) of a grammar rule, the parser removes them and substitutes the corresponding Left Hand Side (LHS) non-terminal.
- Accept
If the stack contains only the start symbol and the input sequence is completely exhausted (reaching the
$end-marker), the sentence is syntactically correct and accepted. - Error
If neither a valid Shift nor a valid Reduce applies, the parser discovers a syntax error.
Action Trace Example
Using a simplified grammar for arithmetic expressions: E → E + E | id.
| Stack | Input | Action |
|---|---|---|
| $ | id + id $ | Shift |
| $ id | + id $ | Reduce (E → id) |
| $ E | + id $ | Shift |
| $ E + | id $ | Shift |
| $ E + id | $ | Reduce (E → id) |
| $ E + E | $ | Reduce (E → E + E) |
| $ E | $ | Accept |
The inherent challenge in shift-reduce parsing occurs when multiple actions are possible, known as conflicts:
- Shift/Reduce Conflict: The parser doesn't know whether to shift a new token or reduce the current top-of-stack. This often happens in rules with varying associativity (like "if-else" dangling modifiers).
- Reduce/Reduce Conflict: The parser encounters a situation where the stack's top sequence matches two distinct rule RHSs, so it doesn't know which LHS to reduce to.
Algorithms like LR(0), SLR(1), CLR(1), and LALR(1) are sophisticated methods constructed implicitly to handle or avoid these conflicts using lookahead state machines.