How can a handful of observations stand in for an entire population? The question is as old as statistical thinking itself, but a rigorous answer emerged only in the twentieth century. Survey sampling is the branch of statistics that designs and justifies the process of selecting a subset of individuals from a target population and using that subset to make inferences about the whole. The subfield's history is organized around a deep disagreement over the very source of randomness that makes inference possible. Two frameworks—design-based inference and model-based inference—offer competing answers, and their ongoing debate has shaped every major method used in official statistics, opinion polling, and survey research today.
Before the 1930s, sampling was largely haphazard or purposive: researchers selected units they believed were representative, but there was no way to quantify the uncertainty of their estimates. The breakthrough came in 1934, when Jerzy Neyman published a paper that laid the foundation for what is now called design-based inference. Neyman's central insight was that the randomness needed for statistical inference should come from the sampling procedure itself, not from any assumptions about the population.
In the design-based framework, the population values are treated as fixed but unknown constants. The only source of randomness is the random mechanism by which units are selected into the sample. Because the sampler controls the selection probabilities, those probabilities can be used to construct unbiased estimators and to compute standard errors that reflect the actual sampling process. The most famous of these estimators is the Horvitz-Thompson estimator, which weights each sampled unit by the inverse of its inclusion probability. If a unit had a 1 in 100 chance of being selected, its observation is multiplied by 100 to represent the 99 similar units that were not sampled. This logic works without any model of the population's structure—it relies entirely on the known probabilities built into the design.
Design-based inference became the dominant paradigm for official statistics because it offered a guarantee that no other approach could match: the estimates were design-unbiased, and their sampling variance could be estimated from the same design. National statistical agencies adopted probability sampling as the gold standard for censuses, labor force surveys, and agricultural surveys. The framework's strength is its robustness: even if the population is wildly heterogeneous, a properly executed probability sample yields valid inferences. Its vulnerability is that it requires a complete sampling frame, strict adherence to the random selection protocol, and high response rates. When those conditions break down—as they often do in practice—the design-based guarantees weaken.
By the 1970s, statisticians working with small populations, nonresponse, or analytic questions began to chafe against the limitations of design-based inference. Richard Royall and others proposed a fundamentally different starting point: treat the population values themselves as random draws from a superpopulation—a hypothetical infinite distribution that generated the finite population. In this model-based framework, the randomness does not come from the sampling design; it comes from the probability model assumed for the population. The sampling design can be ignored entirely, provided the model is correct.
Model-based inference shifts the burden from design to modeling. If the analyst can specify a plausible model relating the survey variable to auxiliary information (such as age, region, or prior census data), then predictions from that model can be used to estimate population totals. The standard error of the estimate reflects the model's residual variance, not the sampling probabilities. This approach is especially powerful when the sampling frame is incomplete, when nonresponse is severe, or when the goal is to estimate parameters of a causal or structural model rather than a simple population mean.
The tension between the two frameworks is not merely technical; it reflects different commitments about what makes an inference trustworthy. Design-based inference prioritizes objectivity: the estimator's properties depend only on the known randomization, not on the analyst's modeling choices. Model-based inference prioritizes efficiency and flexibility: by using a model, the analyst can borrow strength across domains, smooth over sparse data, and produce estimates for small geographic areas where a design-based approach would yield unacceptably large standard errors. The two frameworks disagree most sharply on the role of the sampling design. For a design-based purist, the design is the source of validity; for a model-based purist, the design is at best irrelevant and at worst a nuisance that complicates the model.
For roughly two decades, the two camps operated in a state of productive disagreement. Design-based theorists refined complex sampling strategies—stratification, clustering, multistage selection—that minimized variance without relying on models. Model-based theorists developed prediction-driven estimators that could handle nonresponse and small samples. Neither side fully displaced the other, and by the 1990s a synthesis began to emerge.
The key bridging method is the generalized regression estimator (GREG). The GREG works by fitting a regression model of the survey variable on auxiliary variables, then using the model predictions to adjust the design-based estimate. If the model is correct, the GREG is more efficient than the plain Horvitz-Thompson estimator. If the model is wrong, the GREG remains design-consistent—it converges to the true population total as the sample size grows, because the design-based weights still anchor the estimate. In other words, the GREG uses the model to improve precision but retains the design as a safety net. This hybrid logic is now standard in many national statistical offices, where it is known as calibration weighting or model-assisted estimation.
The synthesis did not end the debate, but it transformed it. Today, most practitioners accept that both frameworks have a role. Design-based inference remains the default for descriptive surveys where the primary goal is to estimate a population total, mean, or proportion with a known margin of error. Model-based inference is preferred for analytic tasks—estimating regression coefficients, testing hypotheses, or predicting outcomes for small subpopulations. The two frameworks agree that auxiliary information is valuable and that no estimator is useful without a credible measure of uncertainty. They disagree on whether the sampling design must be the ultimate arbiter of that uncertainty.
In contemporary practice, design-based inference leads in official statistics and large-scale government surveys, where the mandate is to produce unbiased estimates for a well-defined population. Model-based inference leads in academic research, business analytics, and small-area estimation, where the questions are more complex and the data are messier. The two frameworks coexist in a state of living disagreement, with each side continuing to refine its methods. Recent work on Bayesian survey sampling, for example, extends the model-based tradition by incorporating prior information, while nonparametric design-based methods relax the assumptions of the Horvitz-Thompson estimator. The central lesson for a student of survey sampling is that the choice between frameworks is not a matter of right versus wrong; it is a matter of which source of randomness—design or model—best supports the inference you need to make.