Why does a melody feel sad, a rhythm urgent, or a chord resolution satisfying? Music cognition asks how listeners perceive, remember, and emotionally respond to music—and why those experiences are so consistent across people yet so varied across cultures. The history of inquiry into these questions is not a single triumphant march but a series of frameworks that have each foregrounded different mechanisms: cosmic order, bodily sensation, mental prediction, neural firing, or social interaction. Some frameworks have been absorbed into later ones; others remain in productive tension today.
Long before experimental psychology, thinkers across Eurasia treated music’s power over listeners as a central puzzle. Greek Harmonic and Ethos Theory (c. 500–300 BCE) held that specific scales (harmoniai) directly shaped a listener’s character and emotions—the Dorian mode made one courageous, the Phrygian ecstatic. This was not mere metaphor; it was a causal claim about the cosmos: musical ratios mirrored the soul’s proportions. Roughly contemporary, Confucian Music-Affect Theory (c. 300 BCE–present) also linked music to moral order, but through a different mechanism. For Confucians, proper music cultivated virtue by harmonizing the heart-mind with social hierarchy; improper music led to disorder. The emphasis was on ritual and collective cultivation rather than individual ethos.
In South Asia, the Nāṭyaśāstra rasa theory (c. 200 CE–present) offered a more systematic emotional taxonomy. The treatise on dramaturgy identified eight (later nine) rasas—flavors or emotional essences—that a performance could evoke, such as love, sorrow, or heroism. Each rasa was tied to specific melodic and rhythmic patterns. Unlike Greek ethos, rasa theory did not claim that music directly imprinted character; instead, it described how aesthetic cues triggered a shared emotional response in a prepared spectator. A similar project emerged in the Islamic world with Maqam Affect Theory (c. 900 CE–present), which assigned emotional and even therapeutic properties to each maqam (melodic mode). A maqam could induce joy, calm, or melancholy, and physicians used them in treatment. These four ancient frameworks all assumed that music’s emotional impact was systematic and knowable, but they disagreed on the underlying mechanism—cosmic harmony, moral cultivation, aesthetic essence, or medical effect. None of them relied on measurement or controlled observation.
A decisive break came with Helmholtzian Psychophysics (1863–1910). Hermann von Helmholtz, in On the Sensations of Tone, replaced speculative affect theory with laboratory measurement. He used resonators, tuning forks, and careful listening experiments to analyze how the ear physically processes sound: the perception of pitch, timbre, and consonance could be explained by the physics of vibrating bodies and the physiology of the inner ear. Consonance, for Helmholtz, arose from the absence of beats between partials; dissonance was roughness. This framework narrowed the question of musical meaning to sensory mechanics. It did not deny emotion, but it treated affect as secondary to the measurable facts of auditory sensation. Helmholtz’s work provided an infrastructure for later empirical work, but it also provoked a reaction: was music cognition really just a matter of ear physiology?
Gestalt Music Psychology (1910–1950) answered that question with a clear no. Gestalt psychologists argued that perception is not the sum of elementary sensations; the whole melody is heard as a shape that persists even when every note is transposed to a different key. They demonstrated grouping principles—proximity, similarity, good continuation—that govern how listeners organize a stream of notes into phrases and voices. A melody is not a chain of tones but a unified figure against a ground. This framework directly challenged Helmholtzian atomism: the ear does not just register frequencies; the mind actively structures sound into coherent patterns. Gestalt principles were later absorbed into cognitive theories of expectation and pattern recognition, but they never fully displaced the Helmholtzian tradition—both measurement and holistic organization remained live concerns.
By the mid-twentieth century, music cognition began to borrow tools from the cognitive revolution. Expectation and Information-Processing Theory (1956–2006), most fully articulated by Leonard Meyer in Emotion and Meaning in Music (1956), proposed that musical emotion arises from the fulfillment or violation of learned expectations. A listener builds probabilistic models of what note or chord should come next; when the music delays or subverts that expectation, tension and affect result. Meyer drew on information theory and Gestalt principles, but his framework was fundamentally predictive: meaning is not in the notes themselves but in the listener’s ongoing anticipation. This approach could explain why a deceptive cadence feels surprising and why a familiar melody still moves us—we remember the expected path.
A different cognitive model emerged with the Generative Theory of Tonal Music (GTTM, 1983–present) by Fred Lerdahl and Ray Jackendoff. They adapted Noam Chomsky’s generative linguistics to music, arguing that listeners unconsciously infer a hierarchical structure—a tree of grouping, meter, and tonal tension—from the surface of a piece. GTTM did not replace expectation theory; it coexisted with it as a competing account of how listeners organize heard music. Expectation theory emphasized temporal prediction and statistical learning; GTTM emphasized a fixed, rule-governed grammar that assigns a single structural description to any tonal passage. The two frameworks disagreed on whether musical understanding is best modeled as probabilistic inference or as the application of innate grammatical rules. Empirical tests in the 1990s and 2000s found support for both, but also revealed that GTTM’s strict hierarchies often fail to capture listeners’ actual judgments, especially in non-tonal or cross-cultural contexts. Neither framework has been fully absorbed by the other; they remain in productive tension, with expectation theory now more influential in statistical-learning and corpus-based research.
Since the early 2000s, music cognition has fragmented into three major approaches that overlap in methods but differ in their core commitments.
Cognitive Neuroscience of Music (2003–present) uses brain imaging (fMRI, EEG, MEG) and patient studies to localize musical processes in neural circuits. It has identified specialized regions for pitch, rhythm, and timbre, and shown that musical training reshapes brain structure. This framework inherits Helmholtz’s commitment to measurement and physiology, but expands it to the whole brain. It often operates independently of GTTM and expectation theory, testing neural responses to specific stimuli rather than modeling the listener’s mental grammar. Its strength is causal evidence; its limitation is that brain activation does not directly reveal the content of musical experience.
Empirical Musicology (2004–present), articulated by Eric Clarke and Nicholas Cook in Empirical Musicology: Aims, Methods, Prospects, is a methodological school rather than a single theory. It insists that claims about music—whether historical, analytical, or cognitive—should be testable against observable data: behavioral experiments, corpus analyses, ethnographic recordings, or performance measurements. Empirical Musicology overlaps with cognitive neuroscience (both use experiments) but distinguishes itself by its openness to qualitative and ecological methods, such as studying real-world listening or performance gestures. It also challenges traditional music theory’s reliance on expert intuition, arguing that many theoretical claims (e.g., about harmonic function) have never been empirically verified. This framework does not displace cognitive neuroscience; it provides a broader methodological umbrella that includes neuroscience as one tool among many.
Embodied Music Cognition (2007–present), developed by Marc Leman in Embodied Music Cognition and Mediation Technology, responds directly to the brain-centered focus of cognitive neuroscience. Leman argues that musical meaning is not just in the head but in the body’s interaction with sound: listeners entrain their movements, feel tension in their muscles, and use gesture to shape their understanding. The body mediates between the acoustic signal and the cognitive interpretation. This framework draws on ecological psychology and phenomenology, and it has practical implications for music technology (e.g., designing interactive systems that respond to bodily gestures). Embodied music cognition does not reject neuroscience; it insists that neural accounts are incomplete without considering the body’s role in shaping perception and emotion.
The three contemporary paradigms—cognitive neuroscience, empirical musicology, and embodied music cognition—agree that music cognition must be studied with rigorous, replicable methods and that subjective introspection alone is insufficient. They also agree that musical experience is multimodal, involving not just hearing but also movement, emotion, and memory. Where they disagree is on the primary locus of explanation. Neuroscience privileges the brain as the site where perception and emotion are computed. Empirical musicology is methodologically pluralist, treating any well-grounded data as relevant. Embodied cognition insists that the body is not a mere input device but an active constituent of musical meaning. No single framework has achieved dominance; instead, researchers often combine them. A study might use fMRI to track brain activity while participants move to a beat (neuroscience), record the movements with motion capture (embodied), and compare results across cultures using corpus analysis (empirical). This pluralism is not a weakness but a sign that music cognition is mature enough to recognize that no single lens captures the full phenomenon.
Remarkably, the ancient affect frameworks have not been entirely superseded. Confucian Music-Affect Theory, Nāṭyaśāstra rasa theory, and Maqam Affect Theory remain active traditions in their respective cultural spheres, and contemporary researchers sometimes revisit them as sources of hypotheses about cross-cultural emotional responses. For example, the rasa theory’s systematic mapping of emotional cues has been compared with modern dimensional models of affect. These frameworks now coexist with experimental science, not as rivals but as historical resources that remind the field that questions about music’s emotional power are as old as civilization itself.