What counts as a successful rehabilitation outcome? For much of the twentieth century, the answer seemed self-evident: a patient improved when their impairment lessened. A stroke survivor who regained arm movement, a spinal cord injury patient who increased muscle strength, a cardiac patient who raised their exercise tolerance—these were the markers of recovery. Yet as rehabilitation clinicians accumulated experience with diverse populations, they began to notice troubling gaps. Two patients with identical impairment gains could have dramatically different lives: one returned to work and community activities while the other remained homebound. The question of what to measure, and from whose perspective, has driven a century of framework development that continues to shape clinical practice and research today.
The earliest systematic approach to rehabilitation outcomes grew directly from the Medical Model of Disability. In this view, disability was a biological deficit located in the individual body, and the goal of rehabilitation was to reduce or eliminate that deficit. Outcomes were therefore measured at the level of impairment: range of motion, muscle strength, reflex integrity, and other physiological signs. Clinicians trained in Physical Medicine and Rehabilitation used standardized goniometers, dynamometers, and manual muscle tests to document change. The framework’s strength was its precision and apparent objectivity—a therapist could measure a joint angle to the nearest degree. Its limitation was that it told clinicians nothing about whether the patient could actually use that motion in daily life. A patient might gain full elbow flexion yet remain unable to feed themselves, because the Medical Model had no vocabulary for capturing real-world function. Despite this limitation, the impairment-based approach dominated rehabilitation documentation through the mid-twentieth century and persists today in contexts where biological recovery is the primary clinical question, such as acute neurological rehabilitation or orthopedic postsurgical monitoring.
By the 1950s, rehabilitation practitioners began asking whether the real test of an intervention was not whether a joint moved but whether a person performed meaningful activities. The Functional Outcome Model shifted the unit of analysis from body structures to whole-person tasks: walking, dressing, bathing, transferring, eating. Instruments such as the Barthel Index (1955) and later the Functional Independence Measure (FIM, 1984) provided standardized scales that rated a patient’s level of assistance needed for each activity. This was not a clean break from the Medical Model but a coexistence with it. In many clinical settings, impairment measures and functional measures were used side by side—the former to track biological recovery, the latter to capture practical ability. The Functional Outcome Model addressed a real pressure: hospitals and insurers wanted to know whether patients could be discharged safely, and functional scores became gatekeeping tools for placement decisions. However, the model carried its own limitation. By fixing the list of activities to be measured, it assumed that the same set of tasks mattered equally to every patient. A professional violinist and a retired gardener might both need help bathing, but their rehabilitation priorities diverged sharply. The Functional Outcome Model could not capture that divergence.
Goal Attainment Scaling (GAS) emerged in 1968 as a direct challenge to the fixed-item logic of functional outcome instruments. Instead of measuring every patient against the same standardized list, GAS asked clinicians and patients to collaboratively set individualized goals before treatment began, then rate the degree to which each goal was achieved on a five-point scale. A patient with aphasia might set a goal of naming five common objects; a patient with lower-limb amputation might target walking 100 meters without a cane. GAS made the outcome framework itself patient-specific. This approach proved especially valuable in heterogeneous populations—pediatric rehabilitation, geriatric care, community-based programs—where no two patients shared the same trajectory. GAS did not replace the Functional Outcome Model; rather, it carved out a complementary niche. Where standardized functional scales offered comparability across groups for research and benchmarking, GAS offered sensitivity to individual change in clinical practice. The tension between these two logics—standardization versus individualization—remains one of the field’s defining debates.
Beginning in the 1990s, a second challenge to observer-rated measurement gained momentum. The Patient-Reported Outcome (PRO) paradigm argued that the person experiencing the condition is the most authoritative judge of their own health status. Rather than a therapist rating how well a patient walks, the patient themselves rates their perceived difficulty, satisfaction, or quality of life. Instruments such as the SF-36, the Sickness Impact Profile, and condition-specific PROs brought the patient’s subjective experience into the outcome equation. This paradigm coexists with both the Functional Outcome Model and GAS, but it introduces a distinctive epistemological claim: that some dimensions of outcome—pain, fatigue, emotional well-being, participation satisfaction—are inaccessible to external observation. A therapist cannot see fatigue; only the patient can report it. The PRO paradigm also intersects with the individualization-versus-standardization tension in a concrete way. PROs are typically standardized instruments with fixed items and response scales, making them comparable across populations. GAS, by contrast, individualizes both the goal and the criterion for success. A clinician choosing between a PRO and GAS must decide whether comparability or personal relevance matters more for the question at hand. In large clinical trials, PROs dominate because of their psychometric properties; in goal-oriented clinical practice, GAS often provides more actionable information.
Running alongside these outcome-specific frameworks, the Evidence-Based Rehabilitation Movement (EBRM) emerged in the 1990s as a methodological infrastructure that reshaped how all outcome frameworks were evaluated. Drawing on the broader evidence-based medicine movement, EBRM demanded that rehabilitation interventions—and the instruments used to measure their effects—be supported by rigorous empirical evidence. For outcome measurement, this meant that any scale, whether impairment-based, functional, patient-reported, or goal-attainment-based, had to demonstrate reliability, validity, responsiveness, and interpretability. The EBRM did not introduce a new type of outcome; instead, it forced methodological upgrades across every existing framework. The Barthel Index was re-evaluated for its ceiling effects; PROs underwent extensive psychometric testing; GAS was scrutinized for inter-rater reliability. In practice, the EBRM created a hierarchy of evidence that privileged randomized controlled trials and systematic reviews, which in turn favored standardized instruments over individualized ones. This methodological pressure has been a source of ongoing tension: GAS advocates argue that individualized measurement is inherently less standardized but more clinically meaningful, while EBRM purists counter that without psychometric rigor, outcome data cannot be trusted to guide practice.
The International Classification of Functioning, Disability and Health (ICF), endorsed by the World Health Organization in 2001, attempted to resolve the fragmentation of the outcome landscape by providing a single integrative taxonomy. The ICF organizes functioning into three domains—body functions and structures, activities, and participation—and adds contextual factors (environmental and personal) that influence all three. This biopsychosocial framework does not replace earlier models so much as absorb and reorganize their commitments. Impairment measures map onto body functions; functional scales map onto activities; participation measures capture community involvement; and the contextual factors acknowledge the role of environment and personal goals, which GAS had already foregrounded. The ICF’s great strength is its comprehensiveness: it can accommodate the Medical Model’s impairment focus, the Functional Outcome Model’s activity focus, the PRO paradigm’s subjective emphasis, and GAS’s individualization, all within a single conceptual space. Its great weakness is its complexity. The full ICF contains over 1,400 categories, making it impractical for routine clinical documentation without extensive simplification. Researchers and clinicians have responded by developing ICF Core Sets—shortlists of relevant categories for specific conditions—but the tension between comprehensiveness and usability remains unresolved.
Today, no single outcome framework dominates rehabilitation. Instead, the field operates as a pluralistic ecosystem. The leading frameworks—the Functional Outcome Model (through instruments like the FIM), the Patient-Reported Outcome Paradigm, Goal Attainment Scaling, and the ICF Biopsychosocial Framework—each occupy a distinct niche. There is broad agreement that outcomes should be multidimensional, capturing more than impairment alone, and that the patient’s perspective matters. There is also consensus that the ICF provides a useful conceptual map for selecting which dimensions to measure. The major disagreements center on two axes. First, standardization versus individualization: should the field prioritize instruments that allow comparison across populations (PROs, functional scales) or methods that capture personally meaningful change (GAS)? Second, observer rating versus self-report: can a clinician accurately assess a patient’s functional status, or is the patient the only valid source for certain domains? These debates are not signs of weakness but of a maturing field that recognizes the complexity of recovery. The most sophisticated contemporary studies often combine multiple frameworks—using a PRO for subjective health status, a functional scale for basic activities, and GAS for individualized goals—while grounding the selection in the ICF taxonomy. The question of what counts as rehabilitation success no longer has a single answer, and the field is richer for embracing that plurality.