How can we infer a relationship between a variable of interest and a set of predictors when the data are noisy, the true functional form is unknown, or the number of predictors rivals the sample size? This tension—between wanting a simple, interpretable description and needing to accommodate complexity—has driven the evolution of regression analysis for over two centuries. Each major framework redefined what counts as a good estimate, what assumptions are tolerable, and what questions the analyst can legitimately ask.
The first systematic framework for regression was built around the method of least squares, developed independently by Legendre and Gauss in the early 1800s. Its core commitment was geometric: find the line (or hyperplane) that minimizes the sum of squared vertical distances from the data points. This principle required no distributional assumptions about the errors—Gauss showed that least squares gave the best linear unbiased estimator under a simple second-moment condition. The framework was enormously successful for astronomical and geodetic problems, where measurement error was the dominant source of uncertainty. Yet it had a sharp limitation: it provided no formal way to quantify uncertainty about the estimated coefficients. Analysts could compute a regression line but could not say how confident they should be that a slope was different from zero. The framework was a powerful computational tool, but it lacked an inferential engine.
R. A. Fisher transformed regression by embedding least squares inside a broader theory of statistical inference. He showed that if the errors are normally distributed, the least squares estimator coincides with the maximum likelihood estimator (MLE). This was not a rejection of least squares but an absorption: Fisher provided the inferential machinery—standard errors, confidence intervals, hypothesis tests, and the analysis of variance—that classical error theory had lacked. The Fisherian framework also introduced the idea of experimental design, where regression could be used to estimate treatment effects in randomized studies. For the first time, regression was not just a descriptive curve-fitting device but a method for drawing probabilistic conclusions. The price of this inferential power was a stronger set of assumptions: normality, independence, and constant variance of errors. When those assumptions failed, the Fisherian framework offered no obvious alternative.
While Fisher was refining inference for a single outcome, other statisticians began extending regression to multiple response variables simultaneously. Multivariate regression treats a vector of outcomes as a function of the same predictors, modeling the covariance structure among the responses. This was not a replacement of Fisherian regression but an expansion: the same least squares and likelihood principles applied, but now the analyst had to estimate a full error covariance matrix. The framework became essential in fields like psychometrics and econometrics, where several correlated outcomes (e.g., test scores, economic indicators) needed joint modeling. Multivariate regression coexists with univariate regression today; the choice depends on whether the research question concerns a single outcome or the relationships among several.
By the 1960s, researchers in fields like economics and ecology were confronting data where the linearity assumption of classical and Fisherian regression was plainly wrong. The response curve might be wiggly, the interactions complex, and the error distribution unknown. Nonparametric regression relaxed the assumption that the regression function takes a known parametric form. Methods such as kernel smoothing, local polynomial regression (LOESS), and smoothing splines let the data determine the shape of the relationship, subject only to a smoothness constraint. The trade-off was immediate: flexibility came at the cost of interpretability and slower convergence rates. Semiparametric regression emerged as a compromise, keeping a linear component for key predictors while modeling others nonparametrically. This framework did not reject Fisherian inference; it narrowed its domain of applicability. Today, nonparametric methods are used when the sample is large enough to estimate a flexible curve, and the goal is prediction or exploration rather than parsimonious explanation.
A different limitation of ordinary least squares became acute in the 1970s: when predictors are highly correlated or when the number of predictors approaches or exceeds the sample size, the least squares estimates become unstable and have enormous variance. Regularization methods addressed this by deliberately introducing bias to reduce variance. Ridge regression (Hoerl and Kennard, 1970) added a penalty proportional to the squared magnitude of the coefficients, shrinking them toward zero but keeping all predictors in the model. The lasso (Tibshirani, 1996) replaced the squared penalty with an absolute-value penalty, which sets some coefficients exactly to zero, performing automatic variable selection. These methods transformed regression from a purely inferential tool into a predictive engine optimized for high-dimensional settings. Regularization coexists with Fisherian regression: when the design matrix is well-behaved and inference is the goal, ordinary least squares remains standard; when prediction in high dimensions is the priority, regularization dominates.
Classical regression assumed a continuous outcome with normally distributed errors. But many real outcomes are binary, counts, or positive amounts. Before 1972, analysts handled these cases with ad hoc transformations or separate methods (e.g., logistic regression for binary data, log-linear models for counts). Nelder and Wedderburn's generalized linear model (GLM) unified these disparate techniques under a single framework. A GLM has three components: a random component from the exponential family (normal, binomial, Poisson, gamma, etc.), a linear predictor, and a link function that connects the mean of the outcome to the linear predictor. This was an extension, not a replacement: ordinary linear regression is a GLM with an identity link and normal errors. The GLM framework absorbed logistic regression, Poisson regression, and others into a coherent theory with a common estimation method (iteratively reweighted least squares) and unified inference. Today, GLMs are the default for non-continuous outcomes, coexisting with regularized and nonparametric methods that can also handle such data but with different trade-offs.
Bayesian regression brings a fundamentally different inferential philosophy to the same estimation problem. Instead of treating the regression coefficients as fixed unknown constants, it treats them as random variables with a prior distribution that is updated by the data to produce a posterior distribution. This framework was computationally impractical until the 1990s, when Markov chain Monte Carlo (MCMC) methods made it feasible to sample from posterior distributions for complex models. Bayesian regression is not merely a computational alternative to frequentist methods; it changes what an estimate means. A Bayesian can say, "There is a 95% probability that the coefficient lies in this interval," while a frequentist must say, "95% of such intervals will contain the true value." The prior distribution also provides a natural form of regularization: a prior centered at zero shrinks estimates in a manner similar to ridge regression, but with a clear interpretation as prior belief. Bayesian regression coexists with frequentist regularization and GLMs; the choice depends on whether the analyst has prior information, needs probabilistic statements about parameters, or prefers the objectivity claims of frequentist methods.
Today, no single regression framework dominates. The leading frameworks—Fisherian inference, nonparametric regression, regularization, GLMs, and Bayesian regression—coexist because they serve different problem structures. They agree on several fundamentals: all rely on some form of loss function or likelihood; all require careful attention to model checking and residual analysis; and all recognize the bias-variance trade-off as central. The disagreements are sharper. Frequentists and Bayesians disagree on the meaning of probability and the role of prior information. Nonparametric advocates argue that parametric assumptions are rarely justified, while parametric modelers counter that interpretability and efficiency matter more than asymptotic flexibility. Regularization methods prioritize prediction accuracy over unbiased estimation, a trade-off that Fisherian inference explicitly avoids. The practical division of labor is pragmatic: GLMs for standard categorical or count outcomes, regularized regression for high-dimensional prediction, nonparametric methods for complex curves with large samples, and Bayesian regression when prior information or probabilistic parameter statements are needed. The history of regression analysis is not a story of one framework triumphing over others; it is a story of successive frameworks expanding the range of problems that can be addressed, each carving out its own domain of applicability.