Every program you write must be translated into instructions a machine can execute. How should that translation be structured? The answer determines how fast the resulting code runs, how many different machines the language can target, how confident you can be that the translation is correct, and even what kinds of programming languages are practical to build. Compiler design is the history of competing answers to that single question—answers that have crystallized into durable frameworks for organizing translation.
The earliest compilers, built in the 1950s, faced severe memory constraints. A compiler that could fit its entire analysis and code generation into a single pass over the source code was a practical necessity. The Single-Pass Compiler Framework (1960–1990) embodied this constraint: it read the source once, from beginning to end, emitting machine code as it went. This approach demanded that the language designer restrict features—forward references, complex type systems, or separate compilation—because the compiler could not revisit earlier decisions. Languages like early Pascal and C were shaped by this discipline.
Yet even in the 1950s, an alternative existed. The Multi-Pass Compiler Model (1950–present) decomposed translation into several sequential phases: lexical analysis, parsing, semantic analysis, intermediate code generation, optimization, and final code emission. Each phase reads the output of the previous one, often stored in an intermediate representation (IR). Multi-pass compilers consumed more memory and ran more slowly per compilation, but they freed language designers from single-pass restrictions. They could handle complex type systems, separate compilation units, and sophisticated optimizations. By the 1970s, as memory costs fell and language complexity grew, the multi-pass model became the default. The single-pass framework narrowed into a niche for simple scripting languages and embedded systems, while multi-pass became the backbone of mainstream production compilers.
Building a multi-pass compiler by hand was labor-intensive, especially the lexical analysis and parsing phases. The Compiler-Compiler Movement (1965–1990) responded by developing tools that generated these front-end phases automatically from formal specifications. Lex and Yacc (later Flex and Bison) became the canonical examples: a developer wrote regular expressions and a context-free grammar, and the tool produced C code for a lexer and parser. This movement did not replace the multi-pass model; instead, it became absorbed as infrastructure within it. The generated parser produced an abstract syntax tree that fed directly into the later phases of a multi-pass compiler. By the 1990s, parser generation was standard practice, and the movement narrowed as its core techniques matured. The lasting legacy is that front-end construction is now a routine engineering task rather than a research frontier.
By the 1980s, two pressures reshaped compiler design. First, the gap between high-level language semantics and hardware performance widened, making aggressive optimization essential. Second, the proliferation of processor architectures made it impractical to write a separate compiler for each machine. The Optimizing Compiler Framework (1980–present) and the Retargetable Compiler Framework (1980–present) emerged together to address these pressures, and they share a critical piece of infrastructure: the intermediate representation.
An optimizing compiler transforms the IR through a sequence of passes—constant propagation, dead code elimination, loop invariant code motion, register allocation via graph coloring—each designed to improve execution speed or reduce code size. The IR is the pivot: it must be low-level enough to expose optimization opportunities yet high-level enough to preserve language semantics. Static single assignment (SSA) form, introduced in the 1980s, became the dominant IR for optimization because it made data-flow analysis simpler and more powerful.
The retargetable framework addresses a different concern: portability. Instead of writing a new compiler for each target machine, a retargetable compiler factors the translation into a machine-independent front end (source to IR) and a machine-dependent back end (IR to target code). The front end is written once; the back end is written for each architecture, but it reuses the same IR and many of the same optimization passes. The GNU Compiler Collection (GCC) and later LLVM exemplify this architecture. The retargetable framework did not reject the multi-pass model—it absorbed it, adding a clear interface between phases so that the front end and back end could be developed independently.
Optimizing and retargetable frameworks are not rivals; they are complementary and deeply entangled. LLVM, for instance, provides a retargetable IR (LLVM IR) and a suite of optimization passes that run on that IR regardless of the target. A compiler built on LLVM is simultaneously optimizing and retargetable. The tension between them is one of emphasis: optimization work prioritizes analysis and transformation algorithms, while retargetability work prioritizes clean IR design and back-end generation. Today, nearly all production compilers are both optimizing and retargetable.
Traditional optimizing compilers translate the entire program ahead of time (AOT) before execution. The Just-in-Time Compilation Framework (1995–present) challenges this timing. A JIT compiler translates code at runtime, often interleaving interpretation with compilation. Its core argument is that runtime information—which code paths are hot, what types flow through variables, what the actual hardware looks like—enables optimizations that no static AOT compiler can match.
Java's HotSpot VM and JavaScript engines like V8 popularized this approach. A typical JIT system starts by interpreting the program or compiling it with a fast, non-optimizing compiler. As it profiles execution, it identifies hot methods or loops and recompiles them with increasingly aggressive optimizations, sometimes even speculating on invariants (e.g., that a virtual call always targets the same method) and inserting guards to deoptimize if the speculation fails. This adaptive recompilation blurs the line between compilation and runtime system. JIT does not replace the optimizing framework; it inherits its IR and many of its optimization passes, but it adds a new dimension: the compiler itself becomes a runtime component that must be fast, memory-efficient, and capable of deoptimization. Today, JIT and AOT optimizing compilers coexist, each suited to different deployment contexts—JIT for dynamic languages and long-running server applications, AOT for embedded systems and mobile apps where startup time and power matter.
Optimizing compilers are enormously complex, and bugs in them can silently introduce errors into every program they compile. The Verified Compiler Framework (2000–present) addresses this by constructing a compiler whose correctness is mechanically proved. The landmark project is CompCert, a verified C compiler that produces code provably matching the source semantics for all defined programs.
CompCert builds on the multi-pass model: it decomposes compilation into a sequence of intermediate languages (C → Clight → C#minor → Cminor → … → assembly), each transformation proved correct using a simulation relation between source and target semantics. The proof is checked by a theorem prover (Coq), eliminating the possibility of human error in the compiler's logic. This framework does not reject optimization—CompCert includes many standard optimizations—but it imposes a discipline: every optimization must be accompanied by a correctness proof. This limits the set of optimizations that can be included, especially the most aggressive, speculative ones that JIT compilers rely on. The verified framework thus challenges the optimizing framework's assumption that any performance gain is worth the risk of a bug. Today, verified compilers are used in safety-critical domains (avionics, automotive, medical devices), while mainstream compilers remain unverified. The tension between peak performance and provable correctness is a live disagreement.
The leading frameworks today—Optimizing, Retargetable, JIT, and Verified—coexist in a division of labor. They agree on the value of phased decomposition and intermediate representations: every modern compiler, whether AOT or JIT, verified or not, organizes translation as a sequence of phases connected by IR. They also agree that optimization is essential, though they disagree on how far to push it and at what cost in correctness assurance.
The deepest disagreements are about timing and trust. JIT frameworks argue that runtime information is indispensable for peak performance, while AOT frameworks counter that predictable, low-overhead compilation matters more for many use cases. Verified frameworks argue that correctness proofs should constrain optimization, while mainstream optimizing frameworks accept the risk of compiler bugs in exchange for maximum performance. LLVM sits at the center of this landscape: it is retargetable, optimizing, and increasingly used as a backend for JIT systems (via LLVM's MCJIT and ORC engines), yet it remains unverified. The field has not converged on a single framework because the trade-offs—performance, portability, correctness, compilation speed—are genuinely incommensurable across different application domains.