In the early 1960s, a wave of severe birth defects linked to thalidomide, a drug prescribed for morning sickness, revealed a devastating gap in medical knowledge. Clinical trials, no matter how rigorous, could not guarantee a drug's safety once it entered widespread use. Rare harms, delayed effects, and harms in vulnerable populations could remain invisible until thousands of people had been exposed. Pharmacoepidemiology emerged to close that gap. Its central problem has been how to detect and prove drug effects—both intended and unintended—after a drug is already on the market, using the messy, incomplete data of real-world patient populations.
The first systematic response to the post-market surveillance gap was the spontaneous reporting system. In these systems, clinicians and manufacturers voluntarily submit reports of suspected adverse drug reactions to a central registry, such as the UK Yellow Card Scheme (1964) or the US FDA Adverse Event Reporting System. The framework's distinctive commitment was to cast a wide net: it did not require a comparison group or a pre-specified hypothesis. Any clinician who suspected a drug caused harm could file a report, and regulators could then scan the accumulating reports for unexpected clusters of events.
Spontaneous reporting systems were the first infrastructure for post-market drug safety, and they remain active today. Their strength is their ability to detect signals for rare, idiosyncratic, or previously unsuspected harms. But their weakness is equally fundamental: they lack denominators. Without knowing how many people took the drug, analysts cannot calculate rates, and without a comparison group, they cannot distinguish a true drug effect from a coincidence. A cluster of reports might reflect a real risk, a media-driven reporting spike, or simply a widely prescribed drug. This limitation drove the next wave of frameworks.
By the 1970s, pharmacoepidemiologists began importing the standard tools of epidemiology—cohort studies and case-control studies—to impose structure on the post-market problem. Instead of waiting for spontaneous reports, researchers would define an exposed group (patients taking a drug) and an unexposed group (patients not taking it), then follow both forward in time or reconstruct their histories from medical records. This shift from passive signal detection to active hypothesis testing allowed researchers to calculate incidence rates, relative risks, and confidence intervals. The Boston Collaborative Drug Surveillance Program, launched in the 1970s, exemplified this approach by conducting hospital-based cohort studies to quantify adverse event rates.
At roughly the same time, a parallel framework emerged in the UK: Prescription Event Monitoring (PEM). PEM was designed to address a specific limitation of cohort and case-control studies—their reliance on researchers actively recruiting and following patients, which was expensive and slow. Instead, PEM used computerized prescription databases to identify all patients who had been prescribed a new drug, then sent follow-up questionnaires to their general practitioners asking about any clinical events that occurred after the prescription. This method preserved the cohort structure (a defined exposed group with follow-up) while dramatically reducing cost and enabling the study of entire populations.
The relationship between these two frameworks was one of coexistence and narrowing. Both imposed formal epidemiological design on the post-market problem, but they differed in their data-collection strategy. Traditional cohort and case-control studies relied on primary data collection—researchers went to the patients or their charts. PEM relied on a hybrid: prescription records for exposure, questionnaires for outcomes. Both frameworks improved on spontaneous reporting by providing denominators and comparison groups, but both still faced a practical ceiling: they could only study a handful of drugs at a time, and they were too slow to detect rapidly emerging safety signals.
The 1990s brought a transformation in scale. The widespread computerization of healthcare—insurance claims, electronic medical records, pharmacy dispensing logs—created vast, pre-existing datasets that could be repurposed for drug safety research. Observational database studies used these secondary data sources to assemble cohorts of hundreds of thousands or even millions of patients, comparing outcomes across dozens of drugs simultaneously. This framework did not replace earlier approaches; it absorbed them by providing the data infrastructure for cohort and case-control designs at an unprecedented scale.
The key shift was from data collection to data linkage. Researchers no longer needed to recruit patients or mail questionnaires; they could define exposure and outcome using administrative codes (e.g., ICD-9 codes for diagnoses, NDC codes for prescriptions). This made studies faster, cheaper, and larger. But it also introduced a new problem: confounding by indication. Patients who receive a particular drug are systematically different from those who do not—they have the disease being treated, and that disease itself may be associated with the outcome of interest. A database study might find that patients taking a painkiller had higher rates of heart attacks, but that could be because the painkiller was prescribed for arthritis, and arthritis patients are less active and more prone to heart disease. The scale of database studies exposed the limits of simple adjustment methods and created a pressing need for more rigorous causal reasoning.
Observational database studies were typically conducted as one-off research projects: a question arose, a dataset was assembled, and an answer was produced months or years later. The next framework, active surveillance, operationalized the database approach into a continuous monitoring infrastructure. Instead of waiting for a signal from spontaneous reports and then launching a study, active surveillance systems proactively scan electronic healthcare data in near real-time, looking for changes in adverse event rates as new drugs enter the market.
The US Food and Drug Administration's Sentinel System, launched in 2008, is the paradigmatic example. Sentinel uses a distributed data network—multiple healthcare organizations maintain their own data in a common format, and queries are run locally without sharing patient-level data—to monitor the safety of approved medical products. This framework transformed pharmacoepidemiology from an episodic, question-driven discipline into a standing surveillance capability. It did not replace observational database studies; rather, it provided the operational infrastructure to make them routine and rapid. The same methods (cohort studies, case-control studies) are used, but they are now embedded in a system designed for repeated, pre-planned analyses.
At the same time that active surveillance was scaling up database studies, a separate methodological revolution was underway. The confounding problems exposed by database studies—especially confounding by indication—could not be solved by larger sample sizes alone. Causal inference frameworks, drawing on work by James Robins, Miguel Hernán, and others, provided a formal language for defining and estimating causal effects from observational data. The central innovation was the concept of the target trial: researchers should specify the hypothetical randomized trial they would like to conduct, then emulate it as closely as possible using observational data.
This framework introduced new analytic tools—inverse probability weighting, g-methods, instrumental variables, and sensitivity analyses for unmeasured confounding—that allowed researchers to address confounding more transparently than traditional regression adjustment. Causal inference frameworks did not replace earlier frameworks; they provided a methodological upgrade that could be applied within cohort studies, database studies, and active surveillance systems alike. The relationship was one of transformation: the same data that had been analyzed with standard regression could now be analyzed with methods that made the assumptions explicit and the estimates more defensible.
Today, pharmacoepidemiology operates as a layered toolkit. Spontaneous reporting systems remain the first line of defense for detecting truly unexpected harms, especially rare events that no study would be powered to find. Observational cohort and case-control studies, now often embedded in database or active surveillance infrastructure, remain the workhorses for quantifying risk. Prescription Event Monitoring continues in specialized contexts where primary care records are available and detailed clinical outcomes are needed. Observational database studies provide the scale for studying heterogeneous effects across subgroups. Active surveillance systems provide the speed for detecting signals early in a drug's market life. And causal inference frameworks provide the methodological rigor for turning associations into credible causal claims.
What the leading frameworks agree on is that pre-market trials are insufficient, that post-market evidence must be systematic rather than anecdotal, and that observational data, despite its messiness, is the only feasible source for studying real-world drug effects. They also broadly agree that confounding by indication is the central methodological challenge and that transparency about assumptions is essential.
Where they disagree is on the balance between speed and certainty. Active surveillance prioritizes timeliness, accepting higher false-positive rates in exchange for early warnings. Causal inference frameworks prioritize validity, insisting on explicit emulation of a target trial even if that means slower answers. There is also an ongoing tension between using primary data (collected for research, richer in clinical detail) and secondary data (administrative, larger but coarser). No single framework dominates because each addresses a different part of the post-market problem: detecting a signal, quantifying a risk, or proving a cause.