Statistical computing emerged from a practical pressure: statisticians needed to perform calculations that were too complex for hand computation or closed-form formulas. The earliest statistical methods—maximum likelihood estimation, analysis of variance, least squares—were developed in an era when computation meant mechanical calculators or pencil and paper. As statistical theory grew more ambitious, the gap between what could be derived mathematically and what could actually be computed widened. Closing that gap became the central project of statistical computing.
The first major framework to address this pressure was the Monte Carlo Method (1940–Present). Developed by Stanislaw Ulam, John von Neumann, and others during the Manhattan Project, the Monte Carlo method uses repeated random sampling to approximate quantities that are difficult to compute analytically. Instead of solving an integral directly, one simulates many random draws and averages the results. This approach was a radical departure from deterministic numerical analysis: it traded exactness for tractability and introduced randomness as a deliberate computational tool. Early applications included neutron transport and the evaluation of multidimensional integrals, but statisticians quickly recognized its potential for problems like sampling from complex distributions.
Alongside the Monte Carlo method, a separate tradition of Numerical Analysis for Statistics (1950–1970) developed. This framework focused on deterministic algorithms for statistical tasks: solving linear systems for least squares, computing eigenvalues for principal component analysis, and evaluating special functions like the cumulative distribution of the normal or chi-square. Where the Monte Carlo method embraced randomness, numerical analysis for statistics aimed for controlled approximation error through techniques such as Gaussian quadrature, matrix factorization, and iterative equation solving. These two frameworks coexisted for decades, each suited to different parts of the statistical workflow. Numerical analysis provided the backbone for classical inference, while Monte Carlo methods opened the door to problems that were otherwise intractable.
The Markov Chain Monte Carlo (MCMC) framework (1953–Present) transformed the Monte Carlo method by introducing dependence between samples. The original Monte Carlo method required independent draws, which is impossible for many distributions of interest. MCMC constructs a Markov chain whose stationary distribution is the target distribution; after a burn-in period, samples from the chain approximate draws from the target. The Metropolis algorithm, published in 1953 by Nicholas Metropolis and colleagues, was the first MCMC method, but it remained a specialized tool in physics for decades. The key difference from the earlier Monte Carlo method was that MCMC could sample from high-dimensional, unnormalized distributions—a capability that would later become essential for Bayesian inference. MCMC did not replace the Monte Carlo method; rather, it extended it to a much wider class of problems, absorbing the earlier framework's logic while adding a new layer of theory about Markov chain convergence and mixing.
The Resampling Methods framework (1970–Present) took a different approach to the same practical pressure. Instead of building a Markov chain, resampling methods reuse the observed data to approximate the sampling distribution of a statistic. The bootstrap, introduced by Bradley Efron in 1979, is the most famous example: by drawing many samples with replacement from the original data, one can estimate standard errors, confidence intervals, and bias without parametric assumptions. Resampling methods differed from both the Monte Carlo method and MCMC in that they did not require a model for the data-generating process; they worked directly from the empirical distribution. This made them attractive for complex or unknown data structures. Resampling methods coexisted with MCMC and the Monte Carlo method, each serving a different role: resampling for inference about estimators, MCMC for sampling from posterior distributions, and the Monte Carlo method for integral approximation.
The EM Algorithm (1977–Present), introduced by Arthur Dempster, Nan Laird, and Donald Rubin, addressed a specific computational bottleneck: maximum likelihood estimation when data are incomplete or contain latent variables. The algorithm alternates between an expectation step (E-step), which computes the expected log-likelihood given the observed data and current parameter estimates, and a maximization step (M-step), which updates the parameters to maximize that expectation. The EM algorithm is deterministic, not stochastic, which distinguishes it sharply from the Monte Carlo family of methods. It converges reliably to a local maximum of the likelihood surface, though often slowly. The EM algorithm did not replace MCMC or resampling methods; instead, it filled a niche where missing data or latent structure made direct optimization infeasible. It remains a standard tool for mixture models, hidden Markov models, and factor analysis.
The Bayesian Computation framework (1990–Present) emerged when Bayesian statistics, which had been theoretically well-developed for decades, finally gained the computational tools to be practically useful. Bayesian inference requires integrating over posterior distributions, which are often high-dimensional and analytically intractable. The earlier Monte Carlo method and MCMC provided the necessary machinery. In the 1990s, statisticians like Adrian Smith, Alan Gelfand, and others demonstrated that MCMC could be applied routinely to Bayesian models, sparking an explosion of applied Bayesian work. Bayesian Computation is not a single algorithm but a framework that combines MCMC, importance sampling, sequential Monte Carlo, and variational inference to approximate posterior distributions. It absorbed MCMC as its primary workhorse while also developing new methods tailored to Bayesian models. The relationship between Bayesian Computation and the earlier frameworks is one of infrastructure: MCMC and the Monte Carlo method provided the computational foundation, and Bayesian Computation built a systematic methodology on top of them.
The Probabilistic Programming framework (2000–Present) represents a further abstraction. Instead of writing custom MCMC code for each Bayesian model, probabilistic programming languages allow users to specify a model declaratively and then automatically generate the inference algorithm. Languages like BUGS (1997), Stan (2012), and PyMC (2003) implement this idea, using Hamiltonian Monte Carlo, variational inference, or other methods under the hood. Probabilistic Programming differs from earlier Bayesian Computation in its emphasis on automation and separation of concerns: the user specifies the model, and the system chooses the inference strategy. This framework narrows the earlier focus on algorithm development by packaging MCMC and related methods into reusable software. It coexists with Bayesian Computation, which continues to develop new algorithms, while Probabilistic Programming makes those algorithms accessible to a broader community of researchers and practitioners.
Today, the leading frameworks—Monte Carlo Method, MCMC, Resampling Methods, EM Algorithm, Bayesian Computation, and Probabilistic Programming—form a complementary ecosystem rather than a set of competing paradigms. They agree on the fundamental principle that computation is an integral part of statistical inference, not a separate afterthought. They also agree that simulation-based methods are essential for modern problems involving high-dimensional data, complex models, and non-standard estimators.
Where they disagree is in their assumptions about the nature of the statistical problem. The Monte Carlo Method and MCMC assume that the target distribution can be simulated, either directly or via a Markov chain. Resampling Methods assume that the empirical distribution of the data is a good proxy for the population distribution. The EM Algorithm assumes that the likelihood can be expressed in terms of complete-data sufficient statistics. Bayesian Computation assumes a fully specified prior and likelihood, while Probabilistic Programming assumes that the model can be expressed in a high-level language and that automated inference will be efficient enough.
These differences lead to a practical division of labor. MCMC and Bayesian Computation dominate Bayesian analysis, especially for complex hierarchical models. Resampling methods are preferred for frequentist inference when analytic standard errors are unavailable. The EM Algorithm is the method of choice for mixture models and missing-data problems. Probabilistic Programming is increasingly the entry point for applied users who want to build custom Bayesian models without becoming experts in MCMC diagnostics. The Monte Carlo Method remains foundational for integral approximation and simulation studies. Numerical Analysis for Statistics, while no longer a separate research frontier, persists as the infrastructure for deterministic computations like matrix operations and optimization that underpin all the other frameworks.
Statistical computing has thus evolved from a collection of ad hoc solutions to a structured field with multiple frameworks, each with its own strengths, assumptions, and domains of application. The frameworks do not replace one another; they accumulate, coexist, and occasionally merge, driven by the enduring pressure to make statistical theory computable.