How can we know whether a development program actually caused the outcomes we observe? This question—the problem of causal attribution—is the central tension that defines impact evaluation. A child who receives a school meal and then scores higher on a test might have improved anyway, or because of a different teacher, or because her family moved to a better neighborhood. The core challenge is constructing a credible counterfactual: what would have happened to the same people in the same circumstances if the program had not existed. Since the 1990s, four distinct methodological schools have emerged to answer this challenge, each with a different philosophy of evidence, a different strategy for constructing the counterfactual, and a different claim about what kind of knowledge policy-makers should trust. These schools—the Experimental Approach, Quasi-Experimental Design, Theory-Based Impact Evaluation, and the Structural Econometric Approach—are not a sequence of replacements. They are concurrent rivals and uneasy complements, and their ongoing disagreements shape how development interventions are designed, funded, and judged.
The Experimental Approach, built around randomized controlled trials (RCTs), rose to prominence in development economics in the 1990s and early 2000s. Its core logic is simple: if you randomly assign some villages or individuals to receive a program and others to a control group, the two groups should be statistically identical on average before the intervention. Any difference in outcomes afterward can then be attributed to the program, not to pre-existing differences or external trends. This claim to internal validity—the ability to identify a causal effect within the study sample—is the Experimental Approach's great strength. Landmark studies on deworming, school inputs, and microcredit demonstrated that even well-intentioned policies could fail when tested rigorously, and the approach quickly became the gold standard for many funders and journals. Yet the very feature that gives RCTs their internal validity also creates their most persistent limitation: external validity. A program that works in one setting may fail in another because of differences in institutions, culture, or implementation quality. Critics within the field argue that the Experimental Approach often treats the intervention as a black box, measuring average effects without explaining why or how the program worked, and that its dominance has narrowed the range of questions development economists ask.
Quasi-Experimental Design developed in parallel with the Experimental Approach, sharing the same "credibility revolution" roots but expanding the range of policies that could be evaluated. Where randomization is impossible—for a national policy change, a financial crisis, or a program already rolled out—quasi-experimental methods use observational data to approximate a counterfactual. Difference-in-differences compares a treated group's trajectory before and after a policy with that of a comparison group. Regression discontinuity exploits arbitrary cutoff rules, such as a poverty score threshold for program eligibility, to compare households just above and just below the line. Instrumental variables use an external source of variation—a rainfall shock, a policy change in one region—to isolate the causal effect of a program. These methods share the Experimental Approach's commitment to clean identification of a single average treatment effect, but they rely on stronger assumptions that must be defended case by case. The relationship between the two schools is one of coexistence and rivalry: quasi-experimentalists argue that their methods can evaluate a wider range of real-world policies, while experimentalists counter that the assumptions required for quasi-experimental identification are often implausible. Both schools, however, focus on reduced-form estimation—measuring the effect of a treatment without modeling the underlying behavioral process that generates it.
Theory-Based Impact Evaluation emerged as a direct critique of the black-box character of both experimental and quasi-experimental approaches. Its central claim is that knowing whether a program caused an outcome is not enough; we also need to know why and how the causal chain operated. Did a cash transfer improve child nutrition because households bought more food, because mothers gained bargaining power, or because the transfer reduced stress? Each mechanism implies a different policy lesson. Theory-based evaluation draws on program theory, logic models, and mixed methods—combining quantitative surveys with qualitative interviews, process tracing, and ethnographic observation—to trace the causal mechanisms from intervention to outcome. This school argues that understanding mechanisms is essential for generalizability: if you know why a program worked, you can predict whether it will work in a different context. The tension with the Experimental Approach is sharp: experimentalists prioritize internal validity and see mechanism testing as a secondary question, while theory-based evaluators worry that a well-identified average effect with no mechanism explanation is a fragile basis for policy. The Structural Econometric Approach shares this concern with mechanisms, but the two schools differ in their tools: theory-based evaluation relies on qualitative and mixed methods, while structural econometricians build formal mathematical models of behavior.
The Structural Econometric Approach takes a fundamentally different stance on what impact evaluation should deliver. Rather than estimating a single average treatment effect, structural economists specify a full model of individual or household behavior—utility functions, budget constraints, production technologies—and estimate the deep parameters that govern choices. Once the model is estimated, the researcher can simulate counterfactual policies: what would happen if the subsidy were doubled, if the eligibility rule were changed, or if the program were implemented in a different market environment. This approach prioritizes external validity and policy simulation over the clean identification of a single effect. Its practitioners argue that reduced-form estimates from experiments or quasi-experiments are local and time-bound, while a well-specified structural model can predict outcomes under entirely new conditions. The cost is that the model's assumptions—functional forms, error distributions, equilibrium conditions—are strong and often difficult to test. The Structural Econometric Approach has coexisted with the Experimental Approach since the 1990s, but it has remained less dominant in funding and policy influence, partly because its methods are technically demanding and partly because its results depend on assumptions that are harder to communicate to policy-makers. The relationship with Theory-Based Impact Evaluation is one of partial overlap: both care about mechanisms and generalization, but structural econometricians pursue those goals through formal modeling rather than qualitative fieldwork.
Today, no single school has won the debate. The Experimental Approach remains the most influential in terms of funding, publication in top journals, and policy credibility, especially among organizations like the World Bank and major foundations. Yet its dominance has provoked a productive backlash. Quasi-experimental methods have become more sophisticated, with new techniques for handling multiple treatments, dynamic treatment effects, and spillovers. Theory-Based Impact Evaluation has gained traction in evaluation practice, particularly in agencies that require explicit theories of change. The Structural Econometric Approach has seen a revival in applied microeconomics, with researchers combining experimental or quasi-experimental identification with structural modeling to get the best of both worlds.
What the leading frameworks agree on is that the counterfactual is the central problem and that transparency about assumptions is essential. They disagree on what kind of evidence should be privileged. Experimentalists and quasi-experimentalists prioritize internal validity and see the average treatment effect as the primary output. Theory-based and structural evaluators prioritize understanding mechanisms and external validity, and they argue that a well-identified effect without a mechanism is incomplete. The deepest disagreement is about the role of models: structural econometricians build explicit behavioral models; experimentalists and quasi-experimentalists prefer to minimize modeling assumptions; theory-based evaluators use qualitative models of causal chains rather than formal mathematical ones.
In practice, the field is moving toward combination strategies. Researchers increasingly embed qualitative mechanism testing within randomized trials, estimate structural models using experimental variation, and use quasi-experimental methods to test the external validity of experimental results. The four schools remain in living disagreement, but that disagreement has become the engine of methodological progress rather than a barrier to it. Impact evaluation today is defined not by a single orthodoxy but by a pluralist landscape in which the choice of method depends on the question, the context, and the kind of policy inference that is needed.