The Markov property—the conditional independence of the future from the past given the present—transforms the study of random evolution from an intractable history-dependence into a sequence of one-step transitions. This simple constraint has generated a family of frameworks, each designed for different state spaces, mathematical tools, and applications. The history of Markov processes is a story of successive generalizations and repurposings, from discrete combinatorial chains to continuous analytic semigroups to computational sampling machines, each framework preserving the core property while extending its reach.
Andrey Markov introduced Markov chains in 1906 to extend the law of large numbers to dependent trials. His framework worked with a finite or countable state space and a transition matrix governing the probability of moving from one state to another. The chain is fully described by its initial distribution and the transition probabilities, which are assumed constant in time (homogeneous). Markov derived the first limit theorems for chains, classifying states as transient or recurrent and identifying stationary distributions. This discrete, combinatorial framework remains the most intuitive entry point to Markov processes. Today Markov chains are the basis for countless applied models—from PageRank to population genetics—and they continue to be studied for their own sake in probability theory.
In 1940, the Gauss–Markov process emerged as a special case combining Gaussian distributions with the Markov property. The primary example is the Ornstein–Uhlenbeck process, a continuous-time process that represents a noisy relaxation to equilibrium. Unlike discrete Markov chains, Gauss–Markov processes evolve on continuous state spaces, but they exploit the tractability of Gaussian distributions: their finite-dimensional distributions are multivariate Gaussian, and the Markov property simplifies the covariance structure. This framework coexists with the more general theories below; it provides explicit formulas for signal processing, physics, and finance. It does not attempt to cover all Markov processes but rather offers a parametric workhorse where explicit computation is possible.
By the 1950s, William Feller sought to bring Markov processes with continuous state spaces under a unifying analytic framework. The earlier combinatorial approach (chains) could not handle diffusion processes—like Brownian motion—because the state space is uncountable and transitions are over small time intervals. Feller processes are defined by a semigroup of operators acting on continuous functions vanishing at infinity (or bounded continuous functions). The infinitesimal generator encodes the local behavior of the process, connecting Markov processes to partial differential equations. For example, the generator of Brownian motion is the Laplace operator. Feller processes also connect to potential theory and provide conditions for a process to be strong Markov and have cadlag paths. This framework gave analysts a powerful tool: study the process through its generator and semigroup, even when sample paths are hard to describe. It narrowed the focus to processes whose semigroups preserve continuity, leaving some interesting non-Feller processes outside its scope.
In 1953, Nicholas Metropolis and colleagues introduced Markov Chain Monte Carlo (MCMC) as a way to sample from complex distributions by designing a Markov chain whose stationary distribution is the target. This was not a generalization of earlier theory but a computational reuse of it. MCMC repurposes the ergodicity and convergence proofs of classical Markov chains to generate approximate samples from high-dimensional, nonstandard distributions—a task that had no tractable alternative. The key insight: run a chain long enough, and its empirical distribution approximates the stationary distribution. This framework transformed statistics, physics, and machine learning, enabling Bayesian inference, integration over complicated spaces, and optimization. It relies on the theoretical foundations laid by Markov chains and Feller processes for convergence guarantees. Today MCMC is an active research area with algorithms like Hamiltonian Monte Carlo and Metropolis-adjusted Langevin, but its core ideas remain anchored in the classical theory.
Around 1960, Hiroshi Kunita, Shinzo Watanabe, and others developed the martingale problem approach to Markov processes. This framework defines a Markov process in terms of the condition that certain functionals of the process are martingales. Concretely, for a given generator L, the martingale problem asks: find (or show existence/uniqueness of) a probability measure P such that for every test function f, the process f(Xt) - ∫0^t L f(Xs) ds is a martingale. This method is more flexible than the Feller semigroup approach because it does not require the semigroup to map continuous functions to continuous functions. It can handle processes with jumps (Lévy processes, piecewise deterministic processes) and cases where the state space is not locally compact. The martingale problem also connects naturally to stochastic calculus: the stochastic integral representation of martingales allows direct construction of processes. This probabilistic framework absorbed many phenomena that resisted analytic treatment, becoming the preferred tool for constructing new processes and proving uniqueness.
All five frameworks remain active today, each with distinct strengths. Markov chains are still the go-to for discrete-state problems and form the backbone of algorithm design. Gauss–Markov processes provide closed-form solutions in filtering and physics. Feller processes remain central to the analytic study of diffusions, especially where PDE connections are needed. MCMC is the dominant computational tool in Bayesian statistics and machine learning. Martingale problems and stochastic calculus dominate the theory of general Markov processes, especially in finance, where semimartingale models are standard.
Today's leading frameworks agree on the centrality of the Markov property and the usefulness of generators and martingale characterizations. They disagree on what constitutes a natural state space (discrete vs. continuous vs. abstract), how to establish existence and uniqueness (analytic vs. probabilistic), and which properties are essential (sample-path regularity vs. semigroup continuity). The martingale problem has largely absorbed the analytic agenda: most new Markov processes are constructed probabilistically, even when their generators are used for analysis. But the Feller framework remains essential for linking Markov processes to potential theory and to the study of boundaries. The practical success of MCMC has also revived interest in the ergodic theory of chains, creating a feedback loop between theory and computation. The future likely holds further synthesis: as data-driven and computational demands grow, the interplay between discrete chains, continuous analysis, and probabilistic construction will only deepen.
For a comprehensive treatment of the subfield, see "Markov Processes" by Dynkin and "Markov Processes and Potential Theory" by Blumenthal and Getoor. The classic "Markov Chains" by Doob and "Continuous-Time Markov Processes" by Ethier and Kurtz also provide excellent coverage of the frameworks discussed above.