How can we know whether a new school funding formula, a teacher training program, or a voucher system actually improves student outcomes? This question has driven education policy evaluation since the 1960s, but the answers have changed dramatically as economists have debated what counts as evidence, what education is for, and whose interests policies serve. The subfield's history is a story of frameworks that competed, absorbed each other's insights, and eventually settled into a productive but unsettled pluralism.
Education policy evaluation began within the broad framework of Human Capital Theory, which emerged in the early 1960s. Gary Becker and Theodore Schultz argued that schooling is an investment: individuals and societies spend resources on education today to reap higher earnings and productivity tomorrow. For policy evaluation, this framework provided a clear metric: the rate of return to educational investments. If a policy raised test scores or graduation rates, its value could be measured by the future earnings those gains would generate. The framework treated education as a production process where inputs (years of schooling, teacher quality) produced outputs (skills, earnings).
Almost simultaneously, Public Economics of Education (1960–present) added a government perspective. Even if individuals invest rationally, markets may underprovide education because of positive externalities—an educated population benefits everyone through higher tax revenues, lower crime, and better civic participation. Public economists argued that governments should fund and regulate schooling to correct these market failures. For evaluation, this meant that policies should be judged not only by private returns but by social returns, including effects on inequality and economic growth. The two frameworks coexisted comfortably: Human Capital Theory supplied the microeconomic logic of individual choice, while Public Economics justified the state's role in financing and evaluating education.
By the mid-1960s, researchers wanted to move beyond theoretical rates of return to empirical estimates of what actually works in classrooms. The Educational Production Functions framework (1966–present) treated schools as firms: inputs such as class size, teacher experience, and spending per pupil were fed into a statistical function that predicted outputs like test scores. The landmark Coleman Report (1966) used this approach to study U.S. schools and concluded that family background mattered more than school resources—a finding that sparked decades of debate. Production function studies proliferated, but they faced a fundamental problem: correlation is not causation. Schools with smaller classes might also have wealthier parents, better principals, or more motivated students. The framework could describe patterns but could not reliably identify which policies caused better outcomes. This limitation set the stage for a methodological revolution later, but it also provoked a deeper theoretical challenge.
Screening and Signaling Theory (1973–present), developed by Michael Spence and others, directly challenged Human Capital Theory's core assumption. Perhaps education does not build productive skills at all; instead, it signals pre-existing abilities to employers. A diploma certifies that a worker is smart, diligent, and conformist—traits that employers value but that schooling itself may not create. If signaling is the main function of education, then policies that expand access to credentials may not raise productivity; they may simply inflate credential requirements, forcing everyone to get more schooling for the same jobs. This argument had sharp implications for evaluation: a policy that raised graduation rates might look successful under a human capital lens but wasteful under a signaling lens. The Credentialist Critique, a related strand from the sociology of education, reinforced this skepticism by arguing that educational credentials serve primarily as sorting mechanisms that reproduce social hierarchies. Screening theory did not replace Human Capital Theory—both remain active—but it forced evaluators to ask whether observed earnings gains from education reflect genuine skill creation or mere credential inflation. The two frameworks remain in living disagreement today, with each informing different policy recommendations.
By the 1970s, some economists grew dissatisfied with frameworks that treated policy as a technical problem of efficient investment. The Political Economy of Education (1970–present) shifted attention to the interests, institutions, and power relations that shape policy design and implementation. Even a well-designed policy can fail if teachers' unions oppose it, if local elites capture its benefits, or if bureaucratic incentives reward compliance over learning. Political economists study how electoral systems, interest groups, and historical legacies determine which policies get adopted and how they work in practice. This framework coexists with others by adding a layer of explanation: production functions may estimate the effect of smaller classes, but political economy explains why some districts actually reduce class sizes while others do not. It also critiques the causal inference revolution for focusing on narrow treatment effects while ignoring the political conditions that make those effects possible or impossible to scale.
The Market Liberalism and School Choice framework (1980–present) emerged from a different critique: public education systems, shielded from competition, become inefficient and unresponsive to families' needs. Milton Friedman had argued as early as 1955 for vouchers, but the framework gained policy traction in the 1980s and 1990s with charter schools, voucher programs, and tax-credit scholarships. For evaluation, Market Liberalism created a new demand for causal evidence: do school choice programs improve student achievement? Opponents worried they would increase segregation or drain resources from public schools. These empirical questions could not be settled by theory alone; they required rigorous comparisons between students who used choice programs and those who did not. The framework thus became a major driver of the methodological revolution that followed.
The Causal Inference and Program Evaluation methodological school (1990–present) transformed education policy evaluation by insisting on a clear hierarchy of evidence. Randomized controlled trials (RCTs) sit at the top, because random assignment eliminates selection bias: if treatment and control groups differ only by chance, any difference in outcomes must be caused by the policy. When RCTs are impossible, researchers use quasi-experimental methods such as regression discontinuity, difference-in-differences, and instrumental variables, each with its own assumptions and limitations. This framework did not reject earlier approaches; it absorbed and disciplined them. Educational Production Functions, for example, continue to be estimated, but now researchers must justify why their estimates can be interpreted causally. The framework's dominance reshaped the entire subfield: journals prioritize studies with credible identification strategies, funding agencies require rigorous evaluation designs, and policy debates increasingly turn on what the best-identified studies show. However, the framework has also been criticized for narrowing the questions that get asked. RCTs work best for well-defined, short-term interventions with easily measurable outcomes; they struggle with complex, system-wide reforms or with outcomes like citizenship and creativity that resist standardized testing. Political economists point out that even a perfect RCT tells us little about whether a policy will work in a different political context.
The most recent major framework, Behavioral Education Economics (2000–present), challenges the rational-actor assumptions that underpin Human Capital Theory, Market Liberalism, and much of the causal inference tradition. Students and parents do not always make optimal decisions about schooling: they procrastinate on college applications, underestimate future returns, are influenced by default options and social norms, and exhibit present bias. Behavioral economists use insights from psychology to redesign policies—for example, sending text-message reminders to complete financial aid forms or simplifying the college choice process. This framework does not replace earlier ones but adds a layer of realism. Human Capital Theory still describes the long-run logic of educational investment, but behavioral economics explains why people often fail to act on that logic. Market Liberalism assumes that informed consumers will choose the best schools; behavioral economics shows that choice can be overwhelming and that framing effects matter. The behavioral turn thus coexists with and modifies the older frameworks rather than overturning them.
Today, education policy evaluation is a field of productive pluralism. The leading frameworks—Causal Inference and Program Evaluation, Human Capital Theory, Political Economy of Education, and Behavioral Education Economics—each have distinct strengths. Causal inference methods dominate empirical research because they provide the most credible answers to the question "Did this policy cause that outcome?" Human Capital Theory remains the default framework for interpreting those outcomes in terms of economic returns. Political Economy of Education explains why policies succeed or fail in practice, and Behavioral Education Economics suggests low-cost tweaks that improve policy take-up.
Yet the frameworks disagree on fundamental points. Causal inference methods, especially RCTs, prioritize internal validity (confidence that the estimated effect is real) over external validity (whether the effect generalizes to other settings). Political economists argue that this trade-off is too costly: a policy that works in one school district may fail in another because of different political dynamics, and RCTs rarely capture those dynamics. Human Capital Theory and Screening Theory remain in unresolved tension: if signaling is real, then policies that expand access to credentials may be less valuable than policies that improve actual learning. Behavioral economics challenges the rational-choice assumptions of both Human Capital Theory and Market Liberalism, but it has not yet produced a unified alternative theory of educational decision-making.
What the leading frameworks agree on is that education policy evaluation must be empirical, transparent, and attentive to context. The days of relying on theory alone or on simple correlations are over. The field now expects researchers to state their assumptions clearly, justify their identification strategies, and discuss the limitations of their evidence. This methodological consensus, forged through decades of debate, is the subfield's most durable achievement—even as the frameworks continue to argue about what the evidence means and whose questions deserve to be asked.