Grammar Basics
An introduction to Context-Free Grammars, Terminals, and Non-Terminals.
In formal language theory, a Context-Free Grammar (CFG) is a set of recursive rewriting rules (or productions) used to generate patterns of strings. They are the standard way to mathematically describe the syntax of programming languages.
A CFG consists of four components:
- Non-Terminals (N): Variables that represent a syntactic category (e.g.,
Expr,Term). These are the symbols that can be expanded or replaced. - Terminals (Σ): The actual, literal symbols that appear in the target string or source code (e.g.,
id,+,(). These cannot be expanded further. - Productions (P): Rules describing how non-terminals can be replaced. Format:
LHS (Non-Terminal) → RHS (Combination of Terminals & Non-Terminals). - Start Symbol (S): The designated non-terminal symbol from which all derivations begin.
Our parser engine supports standard Backus-Naur Form (BNF) style syntax. You can use either the arrow -> or a colon : to delimit rules. Multiple production bodies can be separated by a pipe |.
T -> T * F | F
F -> ( E ) | id
Epsilon (Empty) Productions
When a non-terminal can be expanded into an empty string, we represent this using the Greek letter Epsilon (`ε`). Our engine natively detects this using either the literal ε or the word epsilon.
A -> ε
A derivation is the sequence of rule applications that transforms the Start Symbol into a string of pure terminals. Let's derive the string id + id * id using the standard arithmetic grammar.
Leftmost Derivation
Always expand the leftmost non-terminal first.
Rightmost Derivation
Always expand the rightmost non-terminal first (used by Bottom-Up parsers).