How can an experimenter extract the most reliable information from a limited number of observations while controlling for unwanted variation? This question has driven the development of experimental design as a distinct statistical subfield. The challenge is not merely to collect data but to plan its collection so that the resulting inferences are valid, precise, and efficient. Over the past century, four major frameworks have emerged, each offering a different answer to this core problem.
The modern era of experimental design began with Ronald Fisher in the 1920s and 1930s, working at the Rothamsted Experimental Station in England. Fisher faced a practical problem: how to draw reliable conclusions from agricultural field trials where soil fertility, weather, and other factors varied unpredictably. His solution was a set of principles that became the foundation of the field.
Fisher introduced three key ideas: randomization, replication, and blocking. Randomization meant assigning treatments to experimental units by chance, which eliminated systematic bias and provided a basis for statistical inference. Replication allowed the experimenter to estimate the variability of the treatment effects. Blocking grouped similar experimental units together, so that comparisons between treatments were made within homogeneous blocks, reducing the influence of nuisance factors. Fisher also championed factorial designs, where multiple factors are varied simultaneously, allowing the detection of interactions between factors—a major advance over the older practice of varying one factor at a time.
The Fisherian framework was heuristic and prescriptive. It provided a catalog of standard designs—completely randomized designs, randomized blocks, Latin squares, and factorial designs—each suited to particular experimental contexts. The goal was to ensure unbiased estimation of treatment effects and valid tests of significance. Fisher's approach dominated agricultural and biological experimentation for decades and remains the default starting point for many experimenters today. Its strength lies in its robustness: randomization and blocking protect against unknown sources of bias without requiring strong assumptions about the underlying model.
By the 1950s, statisticians began to ask whether the Fisherian catalog could be improved by formal mathematical criteria. Optimal Design Theory, developed by Jack Kiefer and others, shifted the focus from a fixed set of designs to a general optimization problem. Given a statistical model—typically a linear model with normally distributed errors—the goal is to choose the design points that minimize some function of the variance-covariance matrix of the parameter estimates.
The most common criterion is D-optimality, which minimizes the determinant of the variance-covariance matrix, effectively minimizing the volume of the confidence ellipsoid for the parameters. Other criteria include A-optimality (minimizing the trace) and E-optimality (minimizing the maximum eigenvalue). These criteria are model-dependent: the optimal design depends on the assumed form of the response function. For example, a D-optimal design for a quadratic model will place design points at the extremes and center of the factor space, while a design for a linear model will concentrate points at the extremes.
Optimal Design Theory represented a narrowing of the Fisherian framework. It abandoned the universal catalog in favor of a mathematical approach that could tailor a design to a specific model and objective. However, this model-dependence created a vulnerability: if the assumed model was wrong, the optimal design could be inefficient or misleading. The framework also assumed that the experimenter could specify the model in advance, which is often unrealistic in exploratory or industrial settings. Despite these limitations, optimal design became a powerful tool in engineering, chemistry, and other fields where the model is well understood and the cost of experimentation is high.
At roughly the same time that optimal design theory was being formalized, a different approach emerged from industrial experimentation. Response Surface Methodology (RSM), developed by George Box and K. B. Wilson in the 1950s, addressed a practical need: finding the settings of several factors that optimize a response, such as yield or purity. RSM did not assume that the experimenter knew the true model in advance. Instead, it treated experimentation as a sequential, iterative process.
RSM begins with a screening phase to identify the most important factors, often using factorial or fractional factorial designs. The experimenter then moves to a region of the factor space where the response is near an optimum, using a series of small experiments guided by the method of steepest ascent. Once near the optimum, a more elaborate design—typically a central composite design—is used to fit a quadratic model that locates the optimum precisely.
RSM coexisted with optimal design theory but addressed a different problem. While optimal design focused on efficient estimation of a fixed model, RSM focused on sequential optimization under model uncertainty. The two frameworks complemented each other: RSM often used optimal designs for its final-stage quadratic modeling, but its overall philosophy was pragmatic and adaptive rather than mathematically optimal in a single-stage sense. RSM became the dominant framework in industrial engineering and quality improvement, especially after Box's collaboration with the statistician and quality guru W. Edwards Deming.
The most recent major framework, Bayesian Experimental Design, emerged in the 1970s and has grown rapidly with advances in computing. It reframes the entire problem of experimental design from a Bayesian perspective. In this framework, parameters are treated as random variables with prior distributions, and the goal is to choose a design that maximizes the expected utility of the experiment, where utility is defined by the experimenter's objectives.
A typical Bayesian design maximizes the expected information gain, measured by the Kullback-Leibler divergence between the prior and posterior distributions of the parameters. This approach naturally incorporates prior knowledge and allows for adaptive experimentation: the design can be updated sequentially as data accumulate. Bayesian experimental design is a decision-theoretic framework that subsumes many classical criteria as special cases under particular priors and utility functions.
The Bayesian framework represents a fundamental shift from the frequentist frameworks that preceded it. Fisherian design, optimal design, and RSM all treat parameters as fixed unknowns and evaluate designs based on their frequentist properties—bias, variance, coverage probability. Bayesian design, by contrast, treats parameters as uncertain and evaluates designs based on expected utility averaged over the prior distribution. This difference leads to different design choices: Bayesian designs tend to place more points in regions where the prior variance is high, while frequentist designs focus on minimizing variance under the assumed model.
Bayesian experimental design also introduced a new capability: fully adaptive experimentation. In a sequential Bayesian design, each new observation can be used to update the posterior, which then informs the choice of the next design point. This is a more radical form of sequential learning than RSM's steepest ascent, because it formally updates the entire probability distribution over parameters rather than just moving toward a local optimum.
Today, all four frameworks remain active, but they occupy different niches. Fisherian designs are still the standard in clinical trials, agricultural experiments, and many biological settings where randomization and blocking are essential for validity. Optimal design theory is widely used in engineering, chemistry, and physics, where the model is well specified and the cost of each observation is high. Response surface methodology remains the workhorse of industrial process optimization and quality improvement.
Bayesian experimental design has become increasingly prominent in fields where prior information is available and sequential adaptation is feasible, such as in clinical trials with adaptive randomization, in computer experiments with expensive simulations, and in machine learning for active learning and Bayesian optimization. The Bayesian framework is also the most flexible, capable of handling complex models, multiple objectives, and decision-theoretic criteria that are difficult to address with frequentist methods.
The leading frameworks today—optimal design theory and Bayesian experimental design—disagree on fundamental assumptions. Optimal design theory assumes that the model is known and that parameters are fixed; it evaluates designs by their frequentist properties. Bayesian experimental design treats the model as uncertain and parameters as random; it evaluates designs by expected utility. This disagreement is not merely philosophical: it leads to different designs in practice. For example, a Bayesian design might place more points in regions of high prior uncertainty, while an optimal design might place points to minimize the variance of a specific parameter estimate.
Despite these disagreements, there is also convergence. Many practitioners use hybrid approaches: they might use a Bayesian design for the initial exploration and then switch to an optimal design for final estimation. The rise of computational methods has made Bayesian design feasible for problems that were previously intractable, and the two frameworks increasingly borrow ideas from each other. The field is now characterized by productive pluralism, with each framework offering distinct tools for different experimental contexts.