Does a public health program actually improve health? For whom, under what conditions, and at what cost? These questions have driven a half-century of methodological debate in program evaluation. The field has never settled on a single answer. Instead, it has produced a sequence of frameworks, each emphasizing different dimensions of what it means to judge success: causal rigor, practical use, explanatory mechanisms, unintended effects, community power, or adaptive learning. The frameworks coexist today, not as a linear succession but as a layered pluralism, with each approach finding its niche.
In the 1960s, evaluation borrowed heavily from the randomized controlled trial (RCT) model of clinical research. The experimental paradigm held that the highest standard of evidence came from comparing a treated group to a control group, ideally through random assignment. Its core commitment was internal validity: the confidence that the program—and nothing else—caused any observed change. Quasi-experimental designs (e.g., difference-in-differences, interrupted time series) extended the logic to settings where randomization was impractical. This paradigm became the methodological infrastructure for large-scale federal evaluations in the United States and elsewhere, and it remains the gold standard for causal claims in many funding agencies and systematic reviews. Yet its very strength—the insistence on isolating a single cause—made it ill-suited for programs that operated in complex social systems where randomization was impossible or where the treatment itself changed over time. The paradigm treated programs as fixed treatments, not as dynamic processes. That assumption would provoke a series of 1970s alternatives.
Three frameworks emerged in the 1970s, each challenging a different blind spot of the experimental paradigm. They did not reject the need for rigor, but they redefined what rigor meant.
Utilization-Focused Evaluation shifted the criterion of success from causal purity to practical relevance. Michael Quinn Patton argued that evaluations should be organized around the intended users and their information needs. If a report sits on a shelf, it has failed—no matter how scientifically sound. This framework turned stakeholder involvement from a side activity into the central organizing principle. It coexists with the experimental paradigm in a division of labor: experimental designs answer "does it work?" when a program is stable and evaluable, while utilization-focused approaches ensure that findings actually influence decisions.
Theory-Driven Evaluation (also articulated in the 1970s by Carol Weiss and others) argued that the experimental paradigm was a "black box" approach: it measured inputs and outputs without asking how the program was supposed to produce its effects. Theory-driven evaluation demanded an explicit program theory—a set of causal mechanisms linking activities to outcomes. Evaluators then tested not just whether the program worked, but why. This framework narrowed the focus from global impact to the logic of the intervention, making it possible to improve programs rather than merely certify them. It stands in a living disagreement with both the experimental paradigm (which sees theory as secondary to experimental control) and with goal-free evaluation.
Goal-Free Evaluation, devised by Michael Scriven, took a deliberately opposite stance. Instead of testing a program theory, the goal-free evaluator deliberately ignores stated goals and looks for all effects, intended or unintended. The assumption is that stated goals bias what the evaluator sees; important side effects (both positive and negative) can be missed. Goal-free evaluation thus acts as a corrective to the narrowing inherent in theory-driven and experimental approaches. It has remained a minority voice but a persistent one, often used in meta-evaluation or as a check on goal-driven studies.
These three frameworks—utilization-focused, theory-driven, and goal-free—do not replace one another. They address different evaluation needs: actionable findings, mechanistic understanding, and unbiased discovery of outcomes.
If utilization-focused evaluation gave stakeholders a seat at the table, Empowerment Evaluation (David Fetterman, 1990s) gave them the table itself. This framework positioned the evaluator as a coach or facilitator who helps program participants conduct their own evaluation. The goal was not just to produce useful information but to build the community's capacity to assess and improve its own programs. Empowerment evaluation shares with utilization-focused evaluation a commitment to stakeholders, but it deepens that commitment by redistributing evaluative authority. It is more radical: it sees evaluation as a tool for social justice and self-determination, not just for better decision-making. In practice, the two frameworks coexist along a spectrum of participation: utilization-focused evaluation is often evaluator-led with stakeholder input; empowerment evaluation is community-led with evaluator support.
By the 2000s, evaluators increasingly encountered programs that were not stable enough for the experimental paradigm or even for theory-driven designs. These were innovations in complex, adaptive systems—social enterprises, community coalitions, rapid-response initiatives—where the program itself evolved in response to feedback. Developmental Evaluation, also articulated by Michael Quinn Patton, offered a framework for such contexts. Rather than testing a fixed model, the developmental evaluator embeds with the program team and provides real-time data to inform ongoing adaptation. The role is not to judge a stable intervention but to support its evolution. Developmental Evaluation narrows the gap between evaluation and program design; it shares with theory-driven evaluation an interest in mechanisms, but it assumes the mechanisms are emergent rather than predetermined. It has become a leading framework for innovation and systems change initiatives, particularly in global health and community development.
Public health program evaluation today is a pluralist field. The experimental paradigm remains the default for causal inference, especially in international development and clinical interventions. Theory-driven and utilization-focused frameworks are widely taught and used by funders who demand both accountability and learning. Empowerment evaluation is prominent in community-based participatory research and health equity work. Developmental Evaluation is increasingly adopted by foundations and nonprofits supporting innovation. Goal-free evaluation persists as a specialized tool for uncovering unintended consequences.
The leading frameworks—Theory-Driven and Developmental Evaluation—reflect a convergence on the importance of program theory and context. Both agree that evaluation must go beyond black-box outcomes to understand how change happens. Their disagreement is about stability: Theory-Driven Evaluation assumes a program theory that can be specified in advance and tested; Developmental Evaluation assumes that in complex systems, the theory must be developed iteratively. The field's enduring tension, then, is between the need for credible causal inference and the need for adaptive, context-sensitive practice. Evaluators now choose frameworks based on the program's maturity, the stakeholders' needs, and the nature of the problem. This deliberate pluralism is itself the core lesson: no single framework fits all public health programs.