Code Optimization & Basic Blocks
Improve the intermediate code to consume fewer resources, execute faster, and maintain strict semantic equivalence.
Basic Blocks & Flow Graphs
A Basic Block is a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt or possibility of branching except at the end.
Partitioning into Basic Blocks:
- Determine Leader statements: The first instruction, any target of a conditional/unconditional jump, and any instruction immediately following a jump.
- For each leader, its basic block consists of the leader and all statements up to but not including the next leader or end of program.
Basic blocks are the nodes of a Control Flow Graph (CFG). Directed edges represent jumps between blocks. Loops in flow graphs are critical targets for optimization.
Inside a single basic block, operations can be transformed into a Directed Acyclic Graph (DAG) to easily identify and eliminate local common subexpressions and dead code.
Machine Independent Optimization
These optimizations apply to the intermediate code regardless of the target CPU architecture.
Principal Sources of Optimization
Common Subexpressions
Avoiding recomputing an expression if its operands have not changed since previously computed.
Copy Propagation
Using a variable's assigned value directly instead of copying it to temporary variables repeatedly.
Dead-Code Elimination
Removing code that computes values never used, or branches that are mathematically unreachable.
Constant Folding
Deducing at compile time that the value of an expression is a constant and replacing the code with the constant.
Loop Optimizations
Loops are where programs spend most of their time. Key techniques include:
- Code Motion (Hoisting): Moving computations that produce the same result every iteration outside the loop.
- Induction Variable Elimination: Removing extra loop counters if they can be derived from another iterating variable.
- Reduction in Strength: Replacing expensive operations (like multiplication) with cheaper ones (like addition) inside loops.
Data Flow Analysis
To perform global optimization across multiple basic blocks, compilers use Data Flow Analysis. It gathers information about the possible set of values calculated at various points in a program.
By solving Data Flow Equations (like Reaching Definitions or Live Variable Analysis) iteratively until convergence, the compiler safely determines where optimizations can cross block boundaries.
Machine-Dependent & Target Optimization
Peephole Optimization
A simple but effective technique applied to the final target code. The compiler examines a short, sliding window (a "peephole") of target instructions and replaces them with a faster or shorter equivalent sequence.
Common targets: Redundant loads/stores, unreachable instructions, branch chaining, algebraic simplifications, and using specific machine idioms.
Naive Code Generator for a VM
A simple "naive" code generator translates each TAC instruction directly into the target machine's instruction set without deeply tracking registers across statements. While it's easy to build to target Virtual Machines (like the JVM), it results in highly inefficient load/store heavy code, emphasizing the need for robust register allocation.
Security Checking of VM Code
When generating code for Virtual Machines, the compiler or the VM runtime must perform strictly enforced security checks (like the JVM's Bytecode Verifier). These checks ensure that the code doesn't maliciously manipulate pointers, illegally access memory boundaries, or bypass type safety rules.