Intermediate Representations
Translate semantic structures into machine-independent intermediate logic using Syntax Trees and Three-Address Code (TAC).
Variants of Syntax Trees
A standard Syntax Tree represents the hierarchical structure of a program. A powerful variant is the Directed Acyclic Graph (DAG).
A DAG is a syntax tree that shares common subexpressions. If an expression like a + a * (b - c) + (b - c) * d appears, the DAG will only create one node for the subexpression b - c, pointing multiple parent nodes to it.
Three-Address Code (TAC)
Three-Address Code is a linearized representation of a syntax tree. It's called "Three-Address" because each instruction has at most three operands: two operands for the computation, and one for the result.
Used to implement if-else, while-loops, and switch-case statements.
Quadruples
A structure with four fields: op, arg1, arg2, and result. Highly readable and easy to optimize, but wastes memory if temporary variables are excessive.
Triples
omits the result field. Instead of writing a result to a temporary variable, it refers to the index of the statement that computed it. Harder to reorder during optimization.
Indirect Triples
Uses a list of pointers to triples, rather than the triples themselves. This allows optimizers to easily reorder execution just by moving pointers, retaining the memory efficiency of triples.
Advanced Translation Concepts
Backpatching
In a one-pass compiler, when compiling boolean expressions for control flow (like if-else or while), the jump destinations are often unknown at the time the jump instruction is generated.
Backpatching solves this by storing lists of instruction indices that are "waiting" for a destination. Once the destination label is determined later in parsing, the compiler iterates through the list and "patches" the blank jump fields with the correct address.
Switch-Case Statements
Switch statements are translated into a sequence of conditional jumps. If the number of cases is large and densely packed, compilers often implement them using a Jump Table—an array of addresses where the expression's offset points directly to the correct case block, executing in $O(1)$ time instead of $O(n)$ cascading if checks.
Declarations & Procedures
For procedure calls, typical TAC looks like:param x1param x2call p, 2
Declarations don't generate execution code—they compute offsets for each variable in memory relative to the activation record block.