LL(1) Parsing

A deterministic top-down parsing algorithm reading input from Left to right, computing a Leftmost derivation, with 1 symbol of lookahead.

How LL(1) Works

LL(1) parsers maintain a stack containing grammar symbols and predictively replace the top non-terminal by looking precisely one terminal ahead in the input stream. This is why LL(1) uses a 2D Parsing Table defined by Table[NonTerminal][Terminal].

To construct this predictive routing table, the compiler computes two mathematical properties for every symbol in the grammar: FIRST and FOLLOW sets.

FIRST and FOLLOW Sets

FIRST Sets

FIRST(α) is the set of terminals that begin strings derived from α. If α can derive the empty string, then ε is also in FIRST(α).

If X is a terminal, FIRST(X) = {X}
If X → ε is a production, add ε to FIRST(X)
If X is a non-terminal and X → Y1 Y2 ... Yk, then place FIRST(Y1) (excluding ε) into FIRST(X). Cascade down the sequence if preceding symbols are nullable.

FOLLOW Sets

FOLLOW(A) is the set of terminals that can appear immediately to the right of Non-Terminal A in some sentential form.

Place the end-of-input marker $ into FOLLOW(S) (the start symbol).
If there is a production A → α B β, then everything in FIRST(β) except for ε is placed in FOLLOW(B).
If there is a production A → α B, or a production A → α B β where FIRST(β) contains ε, then everything in FOLLOW(A) is placed in FOLLOW(B).

2Step-by-Step: Constructing the Sets

Let's calculate the FIRST and FOLLOW sets for a simple grammar: S → a S a | b S b | c

Computing FIRST(S)

S → a S aThe first symbol is the terminal a. Add a to FIRST(S).
S → b S bThe first symbol is the terminal b. Add b to FIRST(S).
S → cThe first symbol is the terminal c. Add c to FIRST(S).

FIRST(S) = { a, b, c }

Computing FOLLOW(S)

Rule 1S is the Start Symbol, so we immediately add the EOF marker $ to FOLLOW(S).
S → a S aLook at S on the RHS. What follows S? The terminal a. Add a to FOLLOW(S).
S → b S bLook at S on the RHS. What follows S? The terminal b. Add b to FOLLOW(S).

FOLLOW(S) = { $, a, b }

LL(1) Ambiguity & Restrictions

A grammar whose parsing table has no multiply-defined entries is an LL(1) grammar. Conversely, if computing the table yields a cell containing two or more overlapping rules, the grammar cannot be deterministically evaluated top-down with 1 lookahead token.

Common obstacles preventing LL(1) compliance:

Left Recursion: Any rule like E → E + T creates an infinite loop where the parser predictively expands E forever waiting for T. LR parsers can handle this natively!
Left Factoring: When two productions share the same prefix (e.g. A → a B | a C), an LL(1) parser cannot know which branch to select seeing just a.
Inherent Ambiguity: A single language string possessing two entirely unique but valid derivation trees.

Interactive Demo

See LL(1) parsing in action. We've prepared a simple, LL(1)-compliant palindrome grammar. Click below to open it in the Syntax Engine and watch the FIRST/FOLLOW sets and Predictive Parsing Table generate instantly.

Comdes