How can we assign probabilities to countable outcomes and use those probabilities to reason about the structure of discrete objects? This question has driven the development of discrete probability, a subfield that builds mathematical tools for modeling randomness over finite or countably infinite spaces. Unlike continuous probability, which relies on calculus and measure theory, discrete probability works with sums, combinatorial counts, and algebraic manipulations. Its history unfolds through five major frameworks, each responding to the limitations of earlier approaches and expanding the reach of probabilistic reasoning.
The earliest systematic framework for discrete probability emerged in the mid-17th century from correspondence between Blaise Pascal and Pierre de Fermat about games of chance. Their key insight was to define the probability of an event as the ratio of favorable outcomes to the total number of equally likely outcomes. This definition, later formalized by Pierre-Simon Laplace, made probability a branch of combinatorics: to compute a probability, one had to count. The framework worked well for dice, cards, and lotteries, where symmetry guaranteed equal likelihood. Its central technique was combinatorial enumeration—permutations, combinations, and the binomial coefficient. However, the equally-likely assumption severely limited the framework's scope. It could not handle biased coins, unequal chances, or situations where outcomes were not naturally symmetric. By the early 19th century, mathematicians began to feel the need for a more flexible way to describe randomness.
The discrete distributions framework, emerging with Siméon Denis Poisson's 1837 work on the Poisson distribution, broke free from the symmetry requirement. Instead of deriving probabilities from combinatorial counts, it assigned probabilities directly to each possible outcome via a probability mass function (pmf). The binomial, geometric, hypergeometric, and Poisson distributions became canonical examples, each defined by a pmf that could accommodate unequal probabilities. This shift allowed probability to model real-world phenomena like rare events (Poisson), repeated trials with success probability (binomial), and sampling without replacement (hypergeometric). The framework also introduced joint and conditional probability mass functions, enabling the analysis of multiple random variables. Discrete distributions did not replace classical combinatorial probability; rather, they absorbed it as a special case—the uniform distribution over a finite set. The classical counting methods remained useful for computing probabilities under symmetry, but the new framework provided a general language for arbitrary discrete randomness.
By the turn of the 20th century, mathematicians realized that many calculations with discrete distributions could be streamlined by encoding the entire probability mass function into a single power series. The probability generating function (PGF) G(z) = E[z^X] and the moment generating function M(t) = E[e^{tX}] became central tools. Generating functions transformed discrete probability by turning operations on distributions—such as computing sums of independent random variables (convolutions)—into algebraic manipulations of series. For example, the PGF of a sum of independent random variables is simply the product of their individual PGFs. This framework also connected probability to analytic combinatorics, where generating functions are used to count combinatorial structures. Generating functions coexisted with discrete distributions, providing a more powerful computational engine. They did not replace the pmf but offered a higher-level perspective that simplified moment calculations, tail bounds, and asymptotic analysis. Today, generating functions remain essential for analyzing random structures, from random graphs to branching processes.
Andrey Markov introduced discrete-time Markov chains in 1906 to model sequences of dependent random variables where the future depends only on the present—the Markov property. This framework addressed a pressure that earlier frameworks could not: how to describe randomness that evolves over time with memory of only the current state. A Markov chain is defined by a transition matrix P, where P_{ij} gives the probability of moving from state i to state j in one step. The chain's behavior over many steps is governed by powers of the transition matrix, linking probability to linear algebra. Key concepts include stationary distributions (the long-run proportion of time spent in each state) and absorption probabilities for chains with absorbing states. Markov chains found immediate applications in physics (random walks), biology (population models), and later in computer science (PageRank, Markov chain Monte Carlo). This framework complemented generating functions: generating functions can be used to analyze hitting times and absorption probabilities in Markov chains, while Markov chains provided a dynamic model that generating functions alone could not capture.
The probabilistic method, pioneered by Paul Erdős in 1947, marked a radical shift in the role of probability within discrete mathematics. Instead of using probability to describe random phenomena, it used probability as a proof technique to establish the existence of combinatorial objects with desired properties. The classic example is Erdős's proof that there exist graphs with arbitrarily large girth and chromatic number: by constructing a random graph and showing that the probability of having the required properties is positive, existence is guaranteed without explicitly building the graph. This framework introduced tools like the Lovász Local Lemma (for avoiding bad events that are mostly independent) and concentration inequalities (Chernoff bounds, Azuma's inequality) to show that random variables are tightly concentrated around their means. The probabilistic method operates alongside the other frameworks: it often uses generating functions to bound probabilities, relies on discrete distributions for expectation calculations, and employs Markov chains in coupling arguments. However, its goal is not to model randomness but to exploit it for deterministic conclusions. This methodological transformation made probability an indispensable tool in extremal combinatorics, graph theory, and theoretical computer science.
Today, the five frameworks coexist with a clear division of labor. Classical combinatorial probability and discrete distributions serve as foundational vocabulary—every student learns pmfs and the uniform distribution first. Generating functions and discrete-time Markov chains are active research tools, each with specialized communities: analytic combinatorics extends generating functions to asymptotic enumeration, while Markov chain theory underpins algorithms for sampling and inference. The probabilistic method is a thriving methodology that draws on all the other frameworks; for instance, concentration inequalities often rely on generating functions (via moment generating functions), and Markov chains appear in proofs of the Lovász Local Lemma through the lopsided Lovász Local Lemma. The frameworks agree on the primacy of probability mass functions and expectation, but they disagree on emphasis: generating function practitioners favor algebraic and asymptotic approaches, while Markov chain researchers focus on dynamics and mixing times. The probabilistic method, being a proof technique rather than a model, is compatible with both. This pluralism is a strength: discrete probability today is not a single monolithic theory but a flexible toolkit whose components reinforce each other, enabling mathematicians and computer scientists to tackle problems that no single framework could handle alone.