The central challenge of causal inference is to move beyond correlation and establish whether a change in one variable produces a change in another. In data science, this question is distinct from prediction: a model that accurately forecasts outcomes may still be useless for understanding what would happen under an intervention. Over the past century, four major frameworks have emerged to address this challenge, each responding to the limitations of its predecessors while coexisting and competing with them. The history of causal inference is not a simple succession of breakthroughs but a story of deepening formalization, expanding scope, and ongoing pluralism.
The earliest systematic framework for causal inference was the Experimental Paradigm (1920–1970), rooted in the work of statisticians such as Ronald Fisher and Jerzy Neyman. Its core commitment was that the most reliable way to establish a causal effect is through randomized controlled experiments. By randomly assigning units to treatment and control groups, the experimenter ensures that, on average, confounding variables are balanced across groups. Any difference in outcomes can then be attributed to the treatment. This paradigm gave rise to foundational methods like analysis of variance (ANOVA) and the design of experiments, which became the gold standard in agriculture, medicine, and the physical sciences.
Yet the Experimental Paradigm had a severe limitation: it required the ability to manipulate the treatment and control the assignment process. In many fields—economics, epidemiology, sociology—randomized experiments are often impractical, unethical, or impossible. Researchers could not randomly assign people to smoke or not smoke, to attend college or not, to experience a recession or not. The pressure to extend causal reasoning to observational data, where the assignment mechanism is unknown, drove the development of a new framework.
The Potential Outcomes Framework (1970–Present), developed primarily by Donald Rubin and building on earlier ideas from Neyman, reframed causal inference as a missing-data problem. For each unit, we can imagine two potential outcomes: one under treatment and one under control. The causal effect for that unit is the difference between these two potential outcomes, but we only ever observe one of them. The challenge is to estimate the average treatment effect by modeling the assignment mechanism—the process that determines which units receive treatment. Key concepts include ignorability (conditional on covariates, treatment assignment is independent of potential outcomes), propensity scores (the probability of treatment given covariates), and methods such as matching, weighting, and instrumental variables. This framework became the dominant language for causal inference in the social sciences, statistics, and biostatistics, precisely because it provided a rigorous way to reason about observational studies without requiring a full causal graph.
Around 1990, a competing formalization emerged: Structural Causal Models (1990–Present), championed by Judea Pearl and others. SCMs represent causal relationships using directed acyclic graphs (DAGs), where nodes are variables and edges represent direct causal effects. The graph encodes qualitative assumptions about the data-generating process, and the do-calculus provides a set of rules for deriving the effect of an intervention from observational data. Unlike the Potential Outcomes Framework, which focuses on the assignment mechanism, SCMs emphasize the causal structure itself. This difference leads to a productive tension. Proponents of SCMs argue that the graphical approach makes causal assumptions explicit and testable, and that the do-calculus can derive identification strategies that are not obvious from a potential-outcomes perspective. Conversely, advocates of the Potential Outcomes Framework contend that the assignment mechanism is the fundamental object, and that graphs are merely a convenient representation of the same underlying assumptions. In practice, many researchers now see the two frameworks as complementary: the Potential Outcomes Framework provides a clear language for defining causal estimands and designing estimators, while SCMs offer a powerful tool for reasoning about identification and for communicating assumptions visually.
By the early 2000s, a third wave began to take shape: Causal Machine Learning (2000–Present). This framework does not replace the earlier formalisms but rather absorbs their identification logic while replacing the estimation engine. Traditional causal inference methods often relied on parametric models (e.g., linear regression) or simple nonparametric adjustments. As datasets grew larger and more complex, these approaches became inadequate. Causal Machine Learning brings flexible, data-adaptive methods—such as random forests, neural networks, and boosting—to the estimation of causal effects. Key innovations include double/debiased machine learning (which uses cross-fitting to remove regularization bias), causal forests (which estimate heterogeneous treatment effects by recursively partitioning the covariate space), and targeted maximum likelihood estimation. The central insight is that the identification step (deciding what to estimate) can be separated from the estimation step (how to compute it from data). Causal ML inherits the counterfactual logic of the Potential Outcomes Framework and the graphical intuition of SCMs, but it adds the ability to handle high-dimensional covariates, complex nonlinear relationships, and treatment effect heterogeneity at scale.
Today, all four frameworks remain active, but the leading contenders are the Potential Outcomes Framework and Structural Causal Models, with Causal Machine Learning serving as an integrative methodological school. There is broad agreement on several points: causal inference requires counterfactual reasoning; identification of causal effects always depends on untestable assumptions (e.g., ignorability, no unmeasured confounding); and the goal is to estimate quantities such as average treatment effects or conditional average treatment effects. The disagreements are more subtle. One persistent debate concerns representation: should the primary formal object be the assignment mechanism (Potential Outcomes) or the causal graph (SCMs)? A second disagreement involves the role of the do-calculus: SCM proponents argue it provides a complete theory of identification, while some Potential Outcomes researchers see it as a special case of the assignment-mechanism framework. A third tension is between flexibility and interpretability: Causal Machine Learning methods can estimate complex effect surfaces, but they often sacrifice transparency, making it harder to verify that identification assumptions hold. In practice, researchers choose among frameworks based on the problem structure: SCMs are favored when domain knowledge can be encoded as a graph; Potential Outcomes when the assignment mechanism is well understood; and Causal Machine Learning when the goal is to estimate heterogeneous effects in high-dimensional data. The subfield is now characterized by productive pluralism, with each framework continuing to refine its tools and to borrow insights from the others.