How can an economist say that one policy outcome is better than another? The question is inescapable for anyone who advises governments, designs taxes, or evaluates public projects. Yet the answer has shifted dramatically over the past two centuries, driven by a single recurring tension: should welfare judgments rely on comparisons of well-being across people, or should they avoid such comparisons as scientifically indefensible? The history of welfare economics is the story of how different frameworks have taken sides on this question, and of the lasting disagreements that remain.
The first systematic framework for welfare economics was Classical Utilitarianism, rooted in the work of Jeremy Bentham and John Stuart Mill. It treated welfare as the sum of individual utilities, where utility was understood as a measurable mental state—pleasure or pain—that could be compared across persons. A policy was good if it produced the greatest happiness for the greatest number. This gave economists a clear, if demanding, normative standard: measure each person's utility, add them up, and choose the policy with the highest total.
Classical Utilitarianism was ambitious, but its foundations proved fragile. By the early twentieth century, economists grew uneasy with the idea that utility could be measured cardinally—like temperature on a thermometer—and then compared across individuals. How could one person's pleasure be weighed against another's? The framework's reliance on interpersonal comparisons of utility became its most vulnerable point, setting the stage for a crisis that would reshape the field.
Arthur Pigou extended the utilitarian logic into a practical program for government intervention. In his 1920 book The Economics of Welfare, he argued that market failures—especially externalities like pollution—create a gap between private and social costs. The remedy was corrective taxes or subsidies that would align private incentives with social welfare. Pigou's framework retained the utilitarian commitment to interpersonal comparison: the policymaker needed to know how much a polluter gained and how much the victims lost, and to weigh them against each other.
The Pigouvian Tradition was enormously influential in applied policy, especially environmental regulation and public goods provision. But it inherited the same foundational problem as Classical Utilitarianism: its welfare judgments depended on cardinal, comparable utility. As the ordinal revolution in economics gained momentum in the 1930s—the view that utility could only be ranked, not measured—Pigou's approach came under increasing pressure. The tradition did not disappear; its policy insights were later absorbed into cost-benefit analysis under a different justification. But within welfare economics, the ground was shifting.
The ordinal revolution created a crisis for welfare economics. If utility could only be ranked within each individual, and if interpersonal comparisons were ruled out, how could any policy that made some people worse off be justified? Two responses emerged almost simultaneously in 1938–1939, and they have coexisted in tension ever since.
Abram Bergson (writing as Burk) and Paul Samuelson proposed the Social Welfare Function (SWF)—a mathematical representation of society's ethical preferences over different distributions of goods. The Bergson-Samuelson SWF did not require cardinal utility; it could be built from ordinal preferences. Crucially, it did require interpersonal comparisons, but it treated them as explicit ethical judgments rather than scientific measurements. The SWF allowed the economist to say: given a particular ethical weighting of individuals' well-being, policy A is better than policy B. This preserved the possibility of distributional analysis, but at the cost of making the ethical premises transparent and contestable.
At almost the same moment, a different group of economists—led by John Hicks, Nicholas Kaldor, and Tibor Scitovsky—developed what became known as the New Welfare Economics. Their strategy was to avoid interpersonal comparisons altogether. They anchored welfare judgments in the Pareto principle: a change is good if it makes at least one person better off and no one worse off. For policies that create winners and losers, they proposed compensation tests: if the winners could hypothetically compensate the losers and still be better off, the policy was deemed an improvement. The Kaldor-Hicks criterion became the foundation of modern cost-benefit analysis.
The two frameworks were not sequential; they were rival responses to the same problem. The Bergson-Samuelson SWF embraced explicit ethical choice; New Welfare Economics tried to make welfare judgments without it. This divergence has never been resolved. In practice, the SWF tradition dominates in optimal tax theory and inequality measurement, where distributional weights are unavoidable. The New Welfare Economics tradition dominates in applied cost-benefit analysis, where the Kaldor-Hicks criterion is the default standard.
In 1951, Kenneth Arrow published Social Choice and Individual Values, which introduced a framework that would permanently alter the landscape. Arrow asked a deceptively simple question: can a society aggregate individual preferences into a collective ranking that satisfies a few reasonable conditions—unanimity, transitivity, independence of irrelevant alternatives, and no dictatorship? His answer, the impossibility theorem, showed that no such aggregation rule exists. Any procedure that respects minimal democratic conditions will sometimes produce inconsistent or arbitrary results.
Arrow's theorem was a direct challenge to the Bergson-Samuelson SWF tradition. Bergson and Samuelson had assumed that a social welfare function could be constructed from individual preferences plus an ethical judgment. Arrow showed that if the ethical judgment itself must be derived from individual preferences through a voting-like procedure, the task is impossible. The theorem did not, however, destroy the SWF approach. Bergson-Samuelson SWFs are not required to satisfy Arrow's conditions because they incorporate an external ethical standard—a philosopher-king's values, not a voting rule. The two frameworks operate in different domains: Social Choice Theory studies the logical limits of democratic aggregation, while the SWF tradition provides a tool for normative analysis under a given ethical commitment.
Social Choice Theory expanded rapidly after Arrow, spawning research on voting rules, strategy-proofness, and mechanism design. It remains a vibrant field, but it has not replaced the SWF tradition. Instead, it has constrained and clarified it: any attempt to derive welfare judgments purely from individual preferences without an external ethical standard runs into Arrow's impossibility.
The most recent framework, Behavioral Welfare Economics, emerged from the behavioral revolution in economics. It questions a foundational assumption shared by the Bergson-Samuelson SWF, New Welfare Economics, and Social Choice Theory: that individuals have stable, well-defined preferences that are revealed by their choices. Behavioral economists have documented systematic deviations from rational choice—present bias, loss aversion, framing effects—that make it unclear whether observed choices actually reflect what people truly want.
Behavioral Welfare Economics proposes alternative welfare criteria. One influential approach, developed by Richard Thaler and Cass Sunstein, is libertarian paternalism: policies should be designed to help people make choices that better serve their own long-run interests, while preserving freedom of choice. Another approach, associated with Daniel Kahneman, focuses on experienced utility—the pleasure or pain actually felt during an experience—rather than decision utility inferred from choices. Both approaches reintroduce a form of interpersonal comparison, though often in a more cautious, empirical way than Classical Utilitarianism.
Behavioral Welfare Economics does not reject the ordinal tradition outright; it inherits the New Welfare Economics' reliance on individual welfare as the ultimate standard. But it argues that the revealed-preference shortcut is unreliable, and that welfare economists must sometimes look beyond choices to assess well-being. This has created a living disagreement with the New Welfare Economics tradition, which insists that choices are the only observable evidence of welfare.
Today, no single framework dominates welfare economics. Instead, the frameworks coexist in a division of labor shaped by the kind of question being asked.
The Bergson-Samuelson SWF is the workhorse of normative public economics. Optimal tax models, inequality measurement (the Atkinson index, the Gini coefficient), and distributional cost-benefit analysis all rely on an explicit SWF that weights different individuals' utilities. The framework's strength is its flexibility: the economist can choose the ethical weights and then trace their implications. Its weakness is that the weights are contestable, and the framework provides no guidance on how to choose them.
New Welfare Economics, in the form of Kaldor-Hicks cost-benefit analysis, is the default standard for project evaluation in government agencies worldwide. Its appeal is that it avoids explicit distributional judgments, but this is also its limitation: a policy that passes the Kaldor-Hicks test may worsen inequality, and the framework has no way to address that.
Social Choice Theory continues to inform mechanism design, voting theory, and institutional design. It has largely moved away from the foundational debates of the 1950s and toward practical questions about how to design rules that align incentives with welfare.
Behavioral Welfare Economics is reshaping policy design in areas like retirement savings, health insurance, and energy conservation. Its influence is growing, but it has not yet produced a unified alternative to the older frameworks.
What do the leading frameworks agree on? All accept that individual well-being is the ultimate standard of evaluation, even if they define well-being differently. All reject the Classical Utilitarian assumption of cardinal, comparable utility as a scientific foundation, though Behavioral Welfare Economics has revived interest in experienced utility as a measurable concept. What they disagree on is whether interpersonal comparisons are necessary (Bergson-Samuelson says yes, New Welfare Economics says no), whether choices reliably reveal welfare (Behavioral says no, New Welfare says yes), and whether welfare judgments can be made without an explicit ethical commitment (New Welfare tries, Bergson-Samuelson insists they cannot).
These disagreements are not signs of failure. They reflect the inescapable fact that welfare economics sits at the intersection of positive science and normative ethics. The frameworks that survive do so because they offer different tools for different tasks, and the field's history is the story of how economists have learned to live with that tension.