Epidemiology has always faced a fundamental tension: how to move from observing patterns of disease in populations to making reliable claims about what causes those patterns, and ultimately to predicting future health outcomes. The methods epidemiologists use to navigate this tension have evolved dramatically over the past century, driven by the limits of earlier approaches and the emergence of new data sources and computational tools. Five major methodological frameworks have shaped the field: Classical Study Designs, Computational Modeling, Causal Inference Frameworks, Molecular and Genetic Epidemiology Methods, and Machine Learning and High-Dimensional Data Methods. Each framework introduced distinctive commitments about what counts as evidence, how to handle confounding, and whether the goal is estimation, prediction, or mechanistic understanding.
The earliest systematic epidemiological methods emerged in the late 19th and early 20th centuries as investigators sought to compare disease occurrence across groups. Classical study designs—cohort, case-control, and cross-sectional studies—provided a structured way to estimate associations between exposures and outcomes. In a cohort study, investigators follow a defined population forward in time, comparing disease incidence between exposed and unexposed groups. The Framingham Heart Study, launched in 1948, exemplified this approach by tracking thousands of residents over decades to identify risk factors for cardiovascular disease. Case-control studies, by contrast, start with cases of disease and select controls without the disease, then look backward to compare exposure histories. This design proved especially useful for rare diseases, as in the landmark 1950 study by Doll and Hill linking smoking to lung cancer. Cross-sectional surveys measure exposure and disease simultaneously, providing prevalence estimates but limited temporal information.
These designs share a common logic: they compare groups to estimate measures of association such as the odds ratio or relative risk. Their strength lies in their transparency and relative simplicity. But they also exposed a persistent problem: observed associations may reflect confounding rather than causation. A cohort study might find that coffee drinkers have lower heart disease risk, but coffee drinking could be a marker for other health behaviors. Classical designs addressed confounding through stratification, matching, and multivariable regression, but these adjustments depended on knowing and measuring all relevant confounders—a condition rarely met in practice. By the mid-20th century, the limits of association-based reasoning were becoming clear, creating pressure for methods that could support stronger causal claims.
While classical designs focused on estimating associations from empirical data, a parallel tradition developed around mathematical models that simulate disease transmission in populations. The first such model, published by Kermack and McKendrick in 1927, divided a population into compartments—susceptible, infected, and recovered (SIR)—and used differential equations to describe how individuals move between compartments over time. This compartmental approach allowed epidemiologists to predict epidemic curves, estimate the basic reproduction number (R₀), and evaluate the potential impact of interventions like vaccination or quarantine.
Computational modeling differs from classical study designs in a fundamental way: it is not primarily about estimating associations from observed data but about representing the mechanistic processes that generate disease patterns. Models make explicit assumptions about transmission rates, contact patterns, and immunity duration. When those assumptions are wrong, predictions can be misleading. Yet the framework proved invaluable for infectious disease control, from smallpox eradication campaigns to pandemic influenza planning. Over time, computational modeling expanded beyond compartmental models to include agent-based simulations, where individual-level behaviors and interactions are modeled explicitly, and stochastic models that incorporate randomness. These methods coexist with classical designs rather than replacing them: models often rely on parameter estimates from cohort or case-control studies, while classical studies use models to test the plausibility of causal mechanisms.
The causal inference revolution in epidemiology began in earnest in the 1960s and 1970s, driven by the recognition that classical study designs and regression adjustments could not reliably distinguish causation from confounding. The Bradford Hill viewpoints, proposed in 1965, offered a set of considerations—strength of association, consistency, temporality, dose-response, and others—for judging whether an association is causal. But these were guidelines, not formal methods. A more rigorous approach emerged from the counterfactual framework, which defines the causal effect of an exposure as the difference between the outcome under exposure and the outcome under no exposure for the same individual. Since we can never observe both states simultaneously, causal inference methods aim to construct valid counterfactual comparisons from observational data.
Directed acyclic graphs (DAGs), introduced into epidemiology in the 1990s, provided a graphical language for representing causal assumptions and identifying sources of bias. DAGs made explicit which variables must be controlled for to block confounding and which should not be controlled for to avoid collider bias. Building on this foundation, g-methods—including inverse probability weighting, g-computation, and g-estimation—offered ways to estimate causal effects in the presence of time-varying exposures and confounders. These methods absorbed and formalized intuitions already present in classical designs: stratification and matching were reinterpreted as attempts to create exchangeable comparison groups. But causal inference frameworks went further by requiring researchers to state their assumptions transparently and by providing tools to test the sensitivity of conclusions to violations of those assumptions.
Causal inference frameworks did not replace classical study designs; rather, they transformed how epidemiologists think about design and analysis. A well-conducted cohort study is now often analyzed using causal methods, and the design itself is evaluated for its ability to support counterfactual reasoning. The framework remains a site of active debate, particularly around the interpretation of effect estimates and the role of mechanistic knowledge.
The 1980s brought new tools for measuring biological markers—DNA sequences, proteins, metabolites—that could be linked to disease outcomes. Molecular and genetic epidemiology methods emerged to handle these high-dimensional, biologically detailed data. Early work focused on candidate gene studies, testing whether specific genetic variants were associated with disease. But these studies suffered from low replication rates and confounding by population stratification. The field shifted toward genome-wide association studies (GWAS) in the 2000s, scanning millions of genetic variants across the genome for associations with traits or diseases.
What makes genetic methods distinctive is their relationship to causal inference. Genetic variants are assigned at conception, largely independent of many environmental confounders, making them natural instrumental variables. Mendelian randomization uses genetic variants as instruments to estimate the causal effect of a modifiable exposure on an outcome, under assumptions that the variant is associated with the exposure, affects the outcome only through the exposure, and is independent of confounders. This approach both depends on and challenges causal inference frameworks: it relies on the same counterfactual logic and DAG-based reasoning, but it also tests the limits of instrumental variable methods by exposing violations such as pleiotropy (where a genetic variant affects the outcome through multiple pathways).
Molecular and genetic methods coexist with classical designs and causal frameworks, often nested within cohort studies that collect biospecimens. They have expanded the scope of epidemiological inquiry to include biological mechanisms, but they have also raised new questions about measurement error, multiple testing, and the interpretation of small effect sizes.
The explosion of electronic health records, wearable sensors, and high-throughput omics data in the 2000s created datasets with thousands or millions of variables—far more than traditional regression models could handle. Machine learning methods, including random forests, support vector machines, and neural networks, offered ways to discover patterns and make predictions in these high-dimensional settings. Unlike classical regression, which assumes a parametric form for the relationship between predictors and outcome, machine learning algorithms learn flexible, often nonlinear functions from the data.
This flexibility comes with a trade-off. Machine learning methods excel at prediction—forecasting who will develop a disease or which patients will respond to a treatment—but they are not designed for causal inference. The same algorithm that accurately predicts disease risk may produce biased estimates of a causal effect if confounding is not properly addressed. This has created a productive tension with causal inference frameworks. Some researchers argue that machine learning should be used primarily for prediction and risk stratification, while causal questions require the formal tools of counterfactual reasoning. Others are developing hybrid methods, such as targeted maximum likelihood estimation (TMLE) and causal forests, that combine machine learning with causal inference to estimate treatment effects while minimizing model misspecification.
Machine learning methods represent an extension of computational modeling's predictive tradition rather than a clean break. Both frameworks prioritize forecasting and pattern recognition over causal explanation. But machine learning has introduced new challenges around interpretability, overfitting, and reproducibility that the field is still grappling with.
Today, no single framework dominates epidemiological methods. Classical study designs remain the backbone of most observational research, providing the data infrastructure on which other methods depend. Computational modeling is essential for infectious disease epidemiology and health policy evaluation. Causal inference frameworks have become the gold standard for studies that aim to estimate the effects of interventions, and their tools—DAGs, g-methods, sensitivity analyses—are now taught in introductory epidemiology courses. Molecular and genetic methods are increasingly integrated into large cohort studies, and Mendelian randomization has become a routine approach for causal assessment when randomized trials are infeasible. Machine learning methods are transforming prediction and risk stratification, especially in clinical epidemiology and precision medicine.
The leading frameworks agree on several points: confounding is the central threat to validity; transparency about assumptions is essential; and no single method guarantees correct inference. They disagree most sharply on the role of prediction versus causal estimation. Causal inference purists argue that machine learning's black-box predictions are of limited use for understanding etiology or guiding policy, while machine learning advocates counter that flexible prediction models can improve causal estimates when properly integrated. The most exciting developments are occurring at the boundaries—hybrid methods that use machine learning to estimate nuisance parameters within causal frameworks, or that combine mechanistic models with data-driven discovery. These approaches are slowly closing the gap between prediction and causation, but the tension remains a defining feature of the subfield.
For students entering epidemiological methods today, the key skill is not mastery of a single framework but the ability to choose among them based on the question at hand. A study of vaccine effectiveness during an outbreak might draw on classical cohort designs, compartmental modeling, and causal inference simultaneously. Understanding the strengths and limits of each framework—and how they relate to one another—is the foundation of rigorous epidemiological practice.