How much does a teacher matter for student learning, and how can that contribution be measured? These questions have driven a half-century of economic research, yet the answers remain contested. The difficulty is that the most observable teacher characteristics—degrees, certifications, years of experience—turn out to be weak predictors of student outcomes. This puzzle forced economists to move from measuring teacher inputs to measuring teacher outputs, and then to confront the institutional and behavioral realities that shape what teachers actually do in classrooms.
The earliest economic framework for thinking about teacher effectiveness was Human Capital Theory, which emerged in the 1960s. Human capital theorists argued that education builds productive skills, and that teachers, as skilled workers, should be rewarded for their own human capital—their education, training, and experience. The policy implication was straightforward: pay teachers more for advanced degrees and seniority, and student outcomes would improve.
Educational Production Functions formalized this logic. Researchers treated schools as factories: inputs (teacher credentials, class size, spending) went in, and outputs (test scores, graduation rates) came out. The goal was to estimate the marginal effect of each input. The most famous application was the 1966 Coleman Report, which found that school inputs, including teacher qualifications, explained surprisingly little of the variation in student achievement once family background was accounted for. This was a crisis for the input paradigm. If observable teacher credentials did not predict student learning, then either teachers did not matter, or the field was measuring the wrong things.
Screening and Signaling Theory, developed in the 1970s, deepened the crisis. Where human capital theory assumed that education builds skills, screening theory argued that education primarily certifies pre-existing ability. A teacher with a master's degree might be no more effective at raising student achievement; the degree simply signals persistence or intelligence. This critique did not deny that teachers matter, but it undermined the rationale for paying for credentials. If signals, not skills, drive the teacher labor market, then the entire input-based policy apparatus—salary schedules, certification requirements, experience bonuses—rested on shaky ground.
Screening theory remains an active framework today, and its tension with output-based measurement is unresolved. If credentials are weak signals of effectiveness, then any evaluation system must look past qualifications to actual classroom results. That is precisely what the next framework attempted to do.
Value-Added Models (VAM) emerged in the 1990s as a direct response to the failure of input-based measures. Instead of asking what teachers bring to the classroom, VAM asks what teachers add to student learning over a school year. By tracking individual students' test-score growth from year to year, VAM isolates the teacher's contribution from prior achievement and family background.
VAM represented a technical narrowing of the Educational Production Function approach. Production functions had struggled with unobserved variables—student motivation, parental support, peer effects—that confounded the estimate of teacher impact. VAM addressed this by using students as their own controls: each student's prior score proxies for all the unobserved factors that were stable over time. The result was a measure of teacher effectiveness that correlated with important long-run outcomes, such as college attendance and future earnings.
Yet VAM faced immediate criticism. Teacher rankings were unstable from year to year and across different tests. The models could not distinguish a teacher's contribution from classroom-level shocks. And they measured only the test-score dimensions of learning, ignoring other valued outcomes. These limitations set the stage for a methodological revolution.
The Causal Inference and Program Evaluation framework, which gained prominence around 2000, did not replace VAM but transformed it. Causal inference brought a new standard: to claim that a teacher caused student learning, researchers needed a credible counterfactual—what would those students have achieved with a different teacher? VAM alone could not guarantee this, because students were not randomly assigned to teachers.
Causal inference methods, such as random assignment experiments, regression discontinuity designs, and value-added models validated against experimental benchmarks, provided the tools to test whether VAM-based rankings reflected true causal effects. Landmark studies using these methods confirmed that teachers identified as high-value-added by VAM did indeed produce lasting gains in earnings and college attendance. This validation gave VAM a new legitimacy, but it also revealed the limits of the approach: causal estimates required strong assumptions, and even validated VAM could not capture every dimension of teacher quality.
Today, causal inference remains the dominant methodological framework for studying teacher effectiveness. It coexists with VAM, providing the rigorous identification that VAM alone cannot supply. But the focus on causal effects also narrowed the field's attention to what is easily measurable, leaving institutional and behavioral questions for other frameworks.
By the 2000s, evidence from VAM and causal inference was being used to justify high-stakes teacher evaluation reforms, including test-score-based accountability, dismissal of low-ranked teachers, and performance pay. Yet these reforms often stalled or backfired. The Political Economy of Education framework emerged to explain why.
Political economy shifted attention from the measurement of effectiveness to the governance of teacher labor markets. It examined how teacher unions, school board politics, collective bargaining agreements, and state regulations shaped the adoption and implementation of effectiveness reforms. Where earlier frameworks assumed that evidence would drive policy, political economy showed that institutional interests and power dynamics could block or distort even well-validated measures. For example, union opposition to VAM-based evaluations reflected not just resistance to accountability but also a rational defense of seniority-based protections that the input paradigm had long supported.
This framework did not reject the insights of VAM or causal inference. Instead, it contextualized them: even the best measure of teacher effectiveness is useless if political constraints prevent its use. Political economy thus complemented the measurement-focused frameworks by explaining the gap between research and reform.
The most recent framework, Behavioral Education Economics, emerged around 2010 and added another layer of context. Where political economy focused on institutions, behavioral economics focused on the psychology of teachers and school leaders. Standard economic models assume that teachers respond rationally to incentives: pay them for performance, and they will work harder. Behavioral economics showed that this assumption is often wrong.
Teachers may be loss-averse, discounting future rewards in favor of current effort. They may be influenced by social norms, peer comparisons, or intrinsic motivation that monetary incentives can crowd out. Behavioral interventions—such as framing bonuses as losses to be avoided, or providing non-monetary recognition—have sometimes outperformed traditional performance pay. This framework does not replace VAM or causal inference; it enriches them by explaining why the same incentive scheme works in one context and fails in another.
Today, teacher effectiveness research is a field of coexisting frameworks, each with a distinct role. Human Capital Theory and Screening and Signaling Theory remain active in debates over teacher compensation and certification, with screening theory continuing to challenge the assumption that credentials signal productivity. Value-Added Models and Causal Inference provide the empirical backbone for measuring teacher impact, though they are increasingly used in combination rather than in isolation. Political Economy of Education and Behavioral Education Economics offer complementary explanations for why measurement-based reforms succeed or fail in practice.
What do these frameworks agree on? Nearly all accept that teachers vary substantially in their effectiveness, that this variation matters for student outcomes, and that observable credentials are poor proxies for that variation. The major disagreements are about the primary levers for policy. Measurement-focused frameworks (VAM, causal inference) prioritize identifying and rewarding effective teachers. Context-focused frameworks (political economy, behavioral economics) argue that institutional and psychological constraints must be addressed first, or the measurement will not translate into improvement. This tension—between knowing what works and making it work in real schools—defines the current frontier of the field.