LL(1) Parsing
A deterministic top-down parsing algorithm reading input from Left to right, computing a Leftmost derivation, with 1 symbol of lookahead.
LL(1) parsers maintain a stack containing grammar symbols and predictively replace the top non-terminal by looking precisely one terminal ahead in the input stream. This is why LL(1) uses a 2D Parsing Table defined by Table[NonTerminal][Terminal].
To construct this predictive routing table, the compiler computes two mathematical properties for every symbol in the grammar: FIRST and FOLLOW sets.
FIRST Sets
FIRST(α) is the set of terminals that begin strings derived from α. If α can derive the empty string, then ε is also in FIRST(α).
- If
Xis a terminal,FIRST(X) = {X} - If
X → εis a production, addεtoFIRST(X) - If
Xis a non-terminal andX → Y1 Y2 ... Yk, then placeFIRST(Y1)(excluding ε) intoFIRST(X). Cascade down the sequence if preceding symbols are nullable.
FOLLOW Sets
FOLLOW(A) is the set of terminals that can appear immediately to the right of Non-Terminal A in some sentential form.
- Place the end-of-input marker
$intoFOLLOW(S)(the start symbol). - If there is a production
A → α B β, then everything inFIRST(β)except forεis placed inFOLLOW(B). - If there is a production
A → α B, or a productionA → α B βwhereFIRST(β)containsε, then everything inFOLLOW(A)is placed inFOLLOW(B).
2Step-by-Step: Constructing the Sets
Let's calculate the FIRST and FOLLOW sets for a simple grammar: S → a S a | b S b | c
Computing FIRST(S)
- S → a S aThe first symbol is the terminal
a. Addato FIRST(S). - S → b S bThe first symbol is the terminal
b. Addbto FIRST(S). - S → cThe first symbol is the terminal
c. Addcto FIRST(S).
Computing FOLLOW(S)
- Rule 1S is the Start Symbol, so we immediately add the EOF marker
$to FOLLOW(S). - S → a S aLook at S on the RHS. What follows S? The terminal
a. Addato FOLLOW(S). - S → b S bLook at S on the RHS. What follows S? The terminal
b. Addbto FOLLOW(S).
A grammar whose parsing table has no multiply-defined entries is an LL(1) grammar. Conversely, if computing the table yields a cell containing two or more overlapping rules, the grammar cannot be deterministically evaluated top-down with 1 lookahead token.
Common obstacles preventing LL(1) compliance:
- Left Recursion: Any rule like
E → E + Tcreates an infinite loop where the parser predictively expandsEforever waiting forT. LR parsers can handle this natively! - Left Factoring: When two productions share the same prefix (e.g.
A → a B | a C), an LL(1) parser cannot know which branch to select seeing justa. - Inherent Ambiguity: A single language string possessing two entirely unique but valid derivation trees.