Survival analysis confronts a fundamental challenge: how do you estimate the time until an event occurs when some subjects never experience that event during the study period? This problem of censored data—where the event time is only partially known—arises in medicine, engineering, economics, and many other fields. Over three centuries, a sequence of frameworks has developed to handle censoring, each building on or reacting to its predecessors.
The earliest systematic approach to survival data came from life tables, first used by John Graunt in the 17th century and later refined by actuaries and demographers. Life tables divide the time axis into discrete intervals and compute the probability of surviving from the start of one interval to the next, using the number of deaths observed in each interval. They handle censoring by simply removing censored individuals from the risk set at the point they are lost. Life tables are nonparametric—they make no assumption about the shape of the survival curve—but they require grouping of event times into intervals, which can obscure fine-grained patterns. For nearly 300 years, life tables were the standard tool for analyzing mortality and failure times.
In the mid-20th century, statisticians began to model survival times using parametric distributions. Parametric survival models assume that the time to event follows a specific probability distribution, such as the exponential, Weibull, log-normal, or gamma. These models allow researchers to estimate the effect of covariates on survival time through accelerated failure time or proportional hazards parameterizations. They also enable prediction of survival probabilities beyond the observed range. However, their validity depends on the correctness of the assumed distribution, which is often difficult to verify.
In 1958, Edward Kaplan and Paul Meier introduced a nonparametric estimator that directly addressed the limitations of both life tables and parametric models. The Kaplan-Meier estimator produces a step-function survival curve that changes only at observed event times, handling censored observations by reducing the risk set at each event time. Unlike life tables, it does not require arbitrary grouping; unlike parametric models, it makes no distributional assumptions. The Kaplan-Meier estimator quickly became the standard method for estimating survival curves from censored data, and it remains in widespread use today for descriptive analysis and for comparing groups via the log-rank test. It coexists with parametric models: researchers often use Kaplan-Meier to visualize the data and parametric models to test specific hypotheses or to adjust for covariates.
David Cox’s 1972 paper introduced a semi-parametric model that transformed survival analysis. The Cox proportional hazards model expresses the hazard rate as a product of an unspecified baseline hazard function and an exponential function of covariates. Because the baseline hazard is left unestimated, the model can estimate covariate effects without requiring a parametric form for the hazard. This innovation—partial likelihood—allowed Cox to separate the estimation of regression coefficients from the estimation of the baseline hazard. The Cox model absorbed the regression ideas of parametric models while avoiding their distributional assumptions, making it far more flexible. It quickly became the dominant framework for analyzing the effect of multiple covariates on survival, and it remains the most widely used survival model in medical research and beyond. Its key assumption—that hazard ratios are constant over time—is often reasonable but must be checked.
By the 1980s, researchers recognized that the standard survival framework was insufficient for two common complications. First, subjects may experience one of several distinct event types—for example, death from heart disease versus death from cancer. Competing risks models extend the Cox model by estimating cause-specific hazard functions or the subdistribution hazard (Fine-Gray model). They allow analysts to study the effect of covariates on each cause separately, while accounting for the fact that other events prevent the event of interest from occurring. Competing risks models transformed the analysis of multiple-event data, replacing the naive approach of treating other events as censoring.
Second, unobserved heterogeneity among subjects can induce dependence among survival times that is not captured by measured covariates. Frailty models introduce a random effect (the frailty) that multiplies the hazard, accounting for clustering or overdispersion. They can be seen as an extension of the Cox model that includes a latent variable. Frailty models are particularly useful in studies with repeated events or clustered data (e.g., patients within hospitals). They coexist with competing risks models; in some settings, both extensions are combined.
The 21st century brought machine learning techniques into survival analysis. Random survival forests, gradient boosting machines (e.g., CoxBoost, XGBoost with survival objectives), and deep learning architectures (e.g., DeepSurv, Cox-nnet) relax the proportional hazards assumption and can automatically capture complex interactions and non-linear effects. These methods are especially powerful in high-dimensional settings (e.g., genomics) where traditional Cox models struggle. Machine learning survival methods do not replace the Cox model but complement it: they often achieve higher predictive accuracy, while the Cox model remains preferred for inference and interpretability. The field is actively exploring how to combine the strengths of both approaches, for example through penalized Cox models or neural networks that incorporate partial likelihood.
Today, the leading frameworks—Kaplan-Meier estimator, Cox proportional hazards model, parametric survival models, and machine learning survival methods—coexist with a clear division of labor. All agree on the central importance of handling censoring and on the need to estimate either the survival function or the hazard function. They disagree on the role of assumptions: parametric models require strong distributional assumptions; the Cox model requires proportional hazards; machine learning methods require careful tuning and are less interpretable. There is also disagreement about the goal: inference (understanding covariate effects) versus prediction (forecasting individual risk). The Cox model is the default for inference; machine learning methods are increasingly used for prediction. Competing risks and frailty models are specialized extensions that address specific data structures. The field continues to evolve, with ongoing efforts to integrate machine learning with classical survival frameworks and to develop methods that are both interpretable and accurate.