How do we know that a treatment, policy, or exposure actually causes an outcome? The fundamental problem of causal inference is that for any individual unit, we can observe only one version of reality—the outcome under the treatment actually received. The counterfactual outcome under a different treatment remains unobserved. This missing-data structure is the shared puzzle that all frameworks in causal inference confront, but they differ sharply in how they define causal effects, what assumptions they require, and what evidence they accept as sufficient for identification.
The formal language for reasoning about counterfactuals was first introduced in 1923 by Jerzy Neyman in the context of randomized agricultural experiments. Neyman defined a potential outcome for each experimental plot under each treatment condition, making explicit that a causal effect is the difference between these potential outcomes. Only one potential outcome is ever observed per unit; the others are missing. Donald Rubin later revived and generalized this framework in the 1970s, extending it from experiments to observational studies. The Rubin Causal Model, as it became known, centers on the assignment mechanism—the process that determines which units receive which treatments. If treatment assignment is unconfounded given observed covariates, then conditional on those covariates, the treated and untreated groups are exchangeable, and the average causal effect can be estimated by comparing observed outcomes. This framework made the counterfactual definition of causality precise and provided a unified language for both randomized trials and observational research. Its core strength is clarity about assumptions: every causal claim is explicitly tied to a hypothetical intervention and a set of covariates that must be sufficient to render treatment assignment ignorable.
While the potential outcomes framework provided a formal foundation, it did not by itself solve the practical problem of identifying causal effects in observational data. Econometricians and social scientists, working from the 1960s onward, developed a family of quasi-experimental designs that exploit natural or policy-induced variation in treatment assignment. These methods—difference-in-differences, instrumental variables, regression discontinuity, and interrupted time series—do not rely on modeling the outcome process in detail. Instead, they seek a source of exogenous variation that mimics randomization. The intellectual dialogue between potential outcomes and quasi-experimental designs has been productive and sometimes tense. Potential outcomes theorists argued that quasi-experimental designs are best understood as special cases of the Rubin Causal Model, each with its own explicit assumptions about the assignment mechanism. Econometricians, in turn, insisted that design-based strategies offer more credible identification than elaborate statistical modeling of selection bias. This tension between design-based and modeling-based approaches remains a defining feature of the field. Quasi-experimental methods are now routinely taught alongside the potential outcomes framework, and many researchers see them as complementary: the framework provides the formal language, while the designs provide the empirical strategies.
At roughly the same time that Rubin was formalizing potential outcomes, epidemiologists were developing a different causal ontology. The sufficient-component cause model, articulated by Kenneth Rothman in the 1970s, views a cause as any component of a minimal set of conditions that together inevitably produce the outcome. A single disease can have multiple sufficient causes, each composed of several component causes. This model is fundamentally deterministic and multifactorial: it emphasizes that most outcomes result from the joint action of several factors, none of which is strictly necessary or sufficient on its own. The sufficient-component cause model differs from the potential outcomes framework in several important ways. It focuses on individual-level causation rather than average effects, and it does not rely on counterfactuals or assignment mechanisms. Its strength is in capturing biological interactions and the idea that a cause may operate only in the presence of other factors. However, it has largely been absorbed into the broader potential outcomes tradition for quantitative causal inference, because the sufficient-component model does not easily scale to the estimation of average causal effects from observational data. Epidemiologists today often use the potential outcomes framework for effect estimation while retaining the sufficient-component model as a conceptual tool for thinking about mechanisms and interactions.
A major limitation of the standard potential outcomes framework is its handling of time-varying treatments and time-varying confounders. In many longitudinal settings, a treatment administered at one time affects both the outcome and later covariates, which in turn influence later treatments. Standard regression adjustment for these covariates can induce bias by conditioning on a collider or by blocking a causal pathway. James Robins and colleagues developed the G-methods in the 1980s and 1990s to address exactly this problem. G-methods—including G-computation, inverse probability weighting of marginal structural models, and G-estimation of structural nested models—extend the potential outcomes framework to settings with time-varying treatments. They explicitly model the assignment mechanism over time and use weighting or imputation to create a pseudo-population in which treatment is unconfounded at each time point. The G-methods did not replace the potential outcomes framework; they preserved its core logic while solving a technical limitation that had made it difficult to apply in longitudinal studies. Today, G-methods are a standard tool in epidemiology and biostatistics, and they have been integrated with machine learning methods for estimating the nuisance functions required by the weighting and imputation procedures.
In the 1990s, Judea Pearl introduced structural causal models (SCMs), which combine causal directed acyclic graphs (DAGs) with nonparametric structural equations. An SCM represents each variable as a function of its direct causes and an exogenous error term. The graph encodes the causal structure, and the equations encode the functional relationships. Pearl developed the do-calculus, a set of rules for deriving the effect of an intervention from the graph and the observed data. This framework initially generated significant tension with the potential outcomes tradition. Pearl argued that SCMs provide a richer representation of causal knowledge, including the ability to reason about interventions that have not been observed and to derive identification conditions graphically. Proponents of the potential outcomes framework countered that SCMs are essentially a graphical language for the same counterfactual logic, and that the do-calculus recovers results already available through the potential outcomes framework. Over time, a synthesis has emerged. Researchers now recognize that SCMs and potential outcomes are complementary: the graphical approach excels at clarifying identification assumptions and revealing testable implications, while the potential outcomes framework provides a direct route to estimation and inference. The two frameworks have largely converged on a shared understanding of counterfactuals, and many modern methods draw on both traditions.
Today, all five frameworks remain active, but they have settled into a productive division of labor. The potential outcomes framework provides the foundational language for defining causal effects and the core assumptions for identification. Quasi-experimental designs offer credible empirical strategies for settings where randomized experiments are infeasible. The sufficient-component cause model continues to inform thinking about interactions and mechanisms in epidemiology. G-methods handle the complexities of time-varying treatments that simpler approaches cannot address. Structural causal models provide a graphical toolkit for reasoning about identification, mediation, and transportability.
What do the leading frameworks agree on? There is broad consensus that causal inference requires a counterfactual definition of causality, that identification depends on explicit assumptions about treatment assignment, and that no amount of data can substitute for those assumptions. There is also agreement that modern statistical learning methods—such as double machine learning, targeted maximum likelihood estimation, and Bayesian additive regression trees—can be integrated with any of these frameworks to estimate nuisance functions flexibly while preserving valid inference for the causal parameter of interest.
Where do they disagree? The most persistent disagreement concerns the primacy of design versus modeling. Some researchers maintain that credible causal inference comes primarily from the design of the study—the source of variation in treatment—and that statistical modeling should play a minimal role. Others argue that careful modeling of the assignment mechanism and the outcome process is essential, especially in complex longitudinal or high-dimensional settings. This disagreement is not a weakness; it reflects the diversity of empirical problems that causal inference must address. A randomized trial in a controlled setting and an observational study using administrative data require different tools, and the field has developed a rich ecosystem of frameworks to match the demands of each.