Biostatistics is the science of drawing reliable conclusions from health data, but its history is a story of persistent disagreement about what counts as a reliable conclusion. Should evidence come from controlled experiments or from observational studies? Should inference be based on the probability of the data given a hypothesis, or on the probability of a hypothesis given the data? Should the goal be to estimate causal effects, to predict future outcomes, or simply to describe associations? These questions have generated a sequence of frameworks—each a response to the limitations of its predecessors, each still active in some form today.
The deepest divide in biostatistics is between two definitions of probability itself. Bayesian Inference, dating to Thomas Bayes's posthumous 1763 essay, treats probability as a degree of belief that can be updated as new data arrive. A Bayesian analyst starts with a prior distribution—a formal statement of what is known before the study—and combines it with the observed data to produce a posterior distribution. This approach is intuitive for public health questions: if you already have strong evidence that a vaccine is safe, a new trial should shift your belief only modestly. But for most of the twentieth century, Bayesian methods were computationally impractical for all but the simplest problems, and critics argued that priors introduced unacceptable subjectivity.
Frequentist Inference, developed by Ronald Fisher, Jerzy Neyman, and Egon Pearson from the 1920s onward, defined probability as the long-run frequency of events in repeated samples. A frequentist does not assign a probability to a hypothesis; instead, she calculates the probability of observing data as extreme as what was actually seen, assuming the null hypothesis is true (the p-value). This approach avoided subjective priors and became the default for most of the twentieth century, especially after the development of standard tests and confidence intervals. The rivalry between Bayesians and frequentists was not merely philosophical: it shaped which questions could be asked. Frequentist methods dominated because they were computationally tractable and seemed more objective, but they could not directly answer the question a public health official most wants to ask: "Given these data, what is the probability that the intervention works?"
Alongside this foundational debate, the Associational Paradigm emerged around 1900 as the default mode of analysis for observational data. Its core commitment is to measure correlations—risk ratios, odds ratios, regression coefficients—without making strong causal claims. Early twentieth-century epidemiologists used associational methods to link smoking to lung cancer and sanitation to cholera, but they were careful to describe their findings as associations rather than causes. The Associational Paradigm coexisted with both Bayesian and Frequentist Inference, providing a practical toolkit for describing patterns in populations while leaving causal interpretation to other frameworks.
The Randomized Controlled Trial (RCT) Paradigm, inaugurated by the 1948 streptomycin trial for tuberculosis, transformed public health evidence. By randomly assigning participants to treatment or control groups, the RCT aimed to eliminate confounding—the possibility that an observed association is due to a third factor that causes both the exposure and the outcome. Randomization made causal inference possible without relying on untestable assumptions. The RCT quickly became the gold standard for evaluating drugs, vaccines, and clinical interventions, and regulatory agencies such as the FDA built their approval processes around it.
Yet the RCT Paradigm had sharp limits for public health. Many exposures of interest—air pollution, poverty, smoking—cannot be randomized for ethical or practical reasons. The Observational Study Paradigm, which took shape in the 1950s, developed methods to work within these limits. Cohort studies, case-control studies, and cross-sectional surveys became the workhorses of epidemiology. The Observational Study Paradigm did not reject the RCT's logic; rather, it accepted that randomization was often impossible and sought to approximate its benefits through careful design (matching, restriction, stratification) and statistical adjustment. The two paradigms coexisted in a complementary relationship: RCTs provided the strongest evidence for interventions that could be randomized, while observational studies addressed the broader range of exposures that shape population health.
Both experimental and observational studies often follow participants over time, and the outcome of interest is not just whether an event occurs but when it occurs. Survival Analysis, developed from the 1958 publication of the Kaplan-Meier estimator onward, provided methods for analyzing time-to-event data while handling censoring—the fact that some participants drop out or do not experience the event by the study's end. The Cox proportional hazards model (1972) became one of the most widely used statistical tools in public health, allowing researchers to estimate the effect of multiple covariates on the hazard of an event without specifying the shape of the baseline hazard.
Survival Analysis did not replace earlier frameworks; it functioned as infrastructure within them. RCTs used Kaplan-Meier curves to compare treatment arms; observational studies used Cox models to adjust for confounders. The framework's distinctive contribution was to recognize that time itself carries information and that ignoring it—by treating a death at one month the same as a death at ten years—wastes evidence and can bias conclusions.
By the 1970s, a growing number of statisticians and epidemiologists had become dissatisfied with the Associational Paradigm's reluctance to make causal claims. The Causal Inference Framework, which emerged in two main branches, sought to formalize what it means to ask a causal question and to provide methods for answering it from observational data.
Donald Rubin's potential outcomes framework (often called the Rubin Causal Model) defined the causal effect for an individual as the difference between the outcome under treatment and the outcome under control—only one of which can ever be observed. This "fundamental problem of causal inference" made explicit the counterfactual reasoning that underlies all causal claims. Rubin's framework provided a rigorous language for discussing confounding, selection bias, and the conditions under which observational studies could approximate randomized experiments. Methods such as propensity score matching and instrumental variables grew directly from this framework.
Judea Pearl's directed acyclic graphs (DAGs) offered a complementary approach. DAGs made causal assumptions visible as diagrams, showing which variables were causes, which were effects, and which were common causes. Pearl developed a calculus for determining which variables needed to be controlled for and which should not be controlled for (because they would introduce bias). The DAG approach gave epidemiologists a practical tool for designing analyses and communicating assumptions.
The Causal Inference Framework directly challenged the Associational Paradigm's methods. A simple regression adjustment, the causal framework showed, could either reduce bias or introduce it, depending on the underlying causal structure. The framework did not reject observational studies; it argued that they could support causal conclusions, but only if the analyst made explicit assumptions about the data-generating process and used methods appropriate to those assumptions.
Around 2000, a new set of methods began entering biostatistics from computer science: Machine Learning and Predictive Modeling. Random forests, support vector machines, neural networks, and gradient boosting offered powerful tools for prediction—forecasting outcomes for new individuals based on patterns in large datasets. This goal differed fundamentally from the inferential goals of the earlier frameworks. Frequentist and Bayesian inference ask about parameters: Does the treatment work? What is the size of the effect? Machine learning asks about predictions: Given this patient's characteristics, what is the probability she will develop diabetes in the next five years?
The tension between prediction and inference is not merely philosophical. Machine learning methods often sacrifice interpretability for accuracy: a neural network may predict better than a logistic regression, but it cannot produce a simple odds ratio for a public health report. Overfitting—finding patterns that do not generalize—is a constant danger. Yet the predictive turn has also enriched the older frameworks. Machine learning can identify complex interactions that traditional regression would miss, and it can improve propensity score estimation and imputation of missing data. The Causal Inference Framework has begun incorporating machine learning methods for estimating treatment effects in high-dimensional settings.
Today, no single framework dominates biostatistics. The RCT Paradigm remains the gold standard for regulatory approval of drugs and devices, but its ethical and practical limits are widely acknowledged. The Observational Study Paradigm continues to be the workhorse for most public health research, but it has been transformed by the Causal Inference Framework: few epidemiologists today would publish an observational analysis without discussing confounding, selection bias, and the assumptions underlying their causal claims. Survival Analysis remains essential infrastructure, used in both experimental and observational studies. Bayesian Inference has experienced a revival since the 1990s, driven by Markov chain Monte Carlo methods that made computation feasible; it is now common in drug development, health technology assessment, and any setting where prior information is valuable. Frequentist Inference remains the default for most hypothesis testing, though its limitations—especially the misinterpretation of p-values—are increasingly debated. Machine Learning and Predictive Modeling is the fastest-growing area, especially in precision medicine, risk prediction, and analysis of large-scale health data.
The leading frameworks today agree on several points: that confounding must be addressed, that assumptions should be made explicit, and that no single method is appropriate for all questions. They disagree on deeper issues: whether probability is subjective or objective, whether causal claims require randomization or can be supported by observational data with strong assumptions, and whether the goal of analysis should be inference or prediction. This pluralism is not a sign of weakness. It reflects the complexity of the questions biostatistics is asked to answer—questions about causes, effects, risks, and predictions in a world where controlled experiments are often impossible and data are always imperfect.