Tonal Cognition v015
This will be the final update. Not because the theory is complete. But because, while it has given me much to audiate, visualize & otherwise consider, it’s costing me practice time. So here’s the latest version of the document. To be returned to later, or abandoned:
Click here to Download the pdf of of Tonal Cognition v015.
Tonal Cognition:
A Unified Model
Brian Hunker
February 2026 — Version 015
A formal account of how tonal orientation arises, operates, and shifts
in a listener who has internalized the 12-tone diatonic field.
1. The Core Claim
This model proposes that tonal cognition is not primarily a matter of pitch recognition, interval calculation, or scale membership. It is a matter of orientation — a continuous, pre-attentive, probabilistic inference about which of twelve possible tonal centers is currently active.
The listener does not hear tones and then compute their relationships. The listener maintains a live probability distribution over possible tonal orientations, and each incoming tone updates that distribution. When the distribution becomes sufficiently concentrated, the listener experiences tonal orientation — the sense of being in a key, of knowing where home is, of feeling the gravitational pulls and resolutions that constitute tonal music.
All of the phenomena that music theory has described under the headings of keys, modes, modulation, chromaticism, tension, and resolution are, in this model, consequences of a single underlying mechanism: Bayesian inference over a 12-way tonal orientation, mediated by a symmetric interference structure generated by the Fa/~Fa axis.
2. Foundational Commitments
2.1 Tones, Not Pitches
We use the term tone throughout this document to refer to a pitch class — one of twelve equally-spaced positions in the 12-tone equal temperament (12TET) universe. A pitch is a specific frequency in Hz. A tone is an abstract relational identity. This distinction matters because the model is a model of cognition, not acoustics. The perceptual system operates on tonal relationships, not on absolute frequencies.
2.2 The 12-Tone Diatonic Field
We reject the conventional distinction between the diatonic scale (7 tones) and the chromatic scale (12 tones) as a fundamental categorical divide. All 12 tones emerge simultaneously from the same relational structure. We call the full set the 12-Tone Diatonic Field, emphasizing that all tones are diatonic — of one tone — differentiated only by their position in the interference pattern generated by the Fa/~Fa axis.
2.3 Purely Relational Identity
No tone has an intrinsic tonal identity. Identity is entirely a function of relationship — specifically, of the listener's current orientation. The same tone T can be Do in one orientation and Sol in another. This is not a feature of the model; it is its central claim. Tonal identity is cast, not inherent.
This is analogous to the roles in an ensemble cast. The same actor plays different roles in different productions. The actor's identity (T0 through T11) is fixed. The role (Do, Sol, Re, etc.) is assigned by the current tonal context.
2.4 A Model of Perception in 12TET
This is not a model of acoustic physics. The harmonic series provides the evolutionary origin of why the system has the structure it does, but the model describes a cognitive structure built through immersive exposure to music. It operates within the perceptual range of approximately 27Hz to 4.2kHz, and within the 12TET tuning system that has become the dominant tonal universe for most listeners.
3. The Actors and the Roles
3.1 The Twelve Actors
The twelve tones of 12TET are labeled T0 through T11. These labels carry no tonal meaning — they are simply the twelve members of the cast, identified by position. T0 is not Do. T0 is an actor who may be playing Do, or Sol, or any other role, depending on the current tonal orientation.
3.2 The Twelve Roles
The twelve functional roles are drawn from a movable solfege system extended across all 12 tones. The roles divide into three structural groups:
The Pentatonic Roles: Do, Sol, Re, La, Mi
The Anti-Pentatonic Roles: ~Do, ~Sol, ~Re, ~La, ~Mi
The Fulcrum Roles: Fa and ~Fa
Every role has an exact antipodal role. The tilde (~) denotes the antipode: ~Do is the direct opposite of Do, ~Sol the opposite of Sol, and so on. Fa and ~Fa are each other's antipodes — they form the orienting axis of the entire system.
Note: In conventional music theory, the leading tone role is called Ti. In this model, Ti does not exist as an independent role. What conventional theory calls Ti is ~Fa — the antipodal role to Fa. This elimination is not cosmetic. It reflects a structural truth revealed by the matrix: the role conventionally called Ti occupies precisely the antipodal position to Fa, and naming it independently obscures that relationship.
3.3 The Wes Anderson Analogy
This is like a Wes Anderson film. The same dozen actors appear in every production. In each film they play different roles. The audience knows the actors — their faces, their presences, their qualities. What changes from film to film is the casting. Tonal orientation is the casting decision that determines which actor plays which role.
Modulation — the shift from one tonal center to another — is a recasting. The actors remain. The roles remain. Only the assignment changes. And crucially, the audience experiences the recasting not as an intellectual event but as a felt shift in the emotional and relational landscape of the drama.
4. The 5+5+2 Partition
The twelve roles divide into three functional groups whose structure is not arbitrary but emerges from the relational geometry of the system.
4.1 The Pentatonic Roles
Do, Sol, Re, La, Mi are the five roles that strongly confirm a given tonal orientation. A listener who hears any of these tones in context experiences it as evidence for the current keycenter. These are the high-confidence roles — the tones whose presence narrows the probability distribution toward a specific orientation.
The pentatonic roles correspond to the universally intuitive pentatonic scale found across virtually every musical tradition. This universality is not coincidental. The pentatonic set represents the tones that are maximally unambiguous relative to any given orientation — the tones that belong exclusively to one diatonic field and not to its tritone rival.
4.2 The Anti-Pentatonic Roles
~Do, ~Sol, ~Re, ~La, ~Mi are the exact antipodal opposites of the pentatonic roles. Each anti-pentatonic tone directly undermines the expectation of its paired pentatonic tone. Hearing ~Do does not merely introduce tension — it specifically and directly confounds the expectation of Do. The anti-pentatonic roles are not a separate category of chromatic alterations; they are the functional negations of the pentatonic roles.
Anti-pentatonic tones belong exclusively to the rival orientation's diatonic field. Their presence in a tonal context is evidence against the current orientation — it shifts the probability distribution toward the tritone-rival keycenter.
4.3 The Fulcrum Roles: Fa and ~Fa
Fa and ~Fa occupy a categorically different position. They are the two tones that belong to both competing diatonic fields simultaneously. The diatonic field of keycenter K contains: Do, Re, Mi, Fa, Sol, La, ~Fa. The diatonic field of keycenter K+6 (the tritone rival) contains: ~Do, ~Re, ~Mi, Fa, ~Sol, ~La, ~Fa. The overlap is exactly {Fa, ~Fa}.
This shared membership is the structural reason for their special status. Fa and ~Fa contribute equal evidence to both competing orientations — their differential confidence is zero. They are not neutral tones in the sense of being unimportant; they are maximally tense tones, pulled equally in both directions simultaneously. They are the axis of orientation precisely because they define the boundary between the two competing diatonic fields.
Fa and ~Fa together constitute what we call the Tritone of Orientation. Any of the six possible tritone pairs in 12TET can function as the Fa/~Fa axis. The coin flip — the perceptual decision about which pole is Fa and which is ~Fa — determines the entire casting. Once the coin lands, all twelve roles are assigned.
4.4 The Copernican Insight
Traditional music theory places Do at the center of the tonal system. This model reveals that Do is the perceptual center — the tone of arrival and rest — but not the generative or structural center.
Fa is the generative center. It stands in the subdominant relationship to Do — the tone from which Do's dominant arrives. Fa does the structural work; Do receives the resolution.
Re is the structural center — the mirror axis of bilateral symmetry. Every tone has a unique mirror partner across Re: Do reflects to Mi, Sol reflects to La, Fa reflects to ~Fa. Re and ~Re are the zero-crossing invariants of the system, maintaining perfect balance regardless of orientation or harmonic depth. Re divides the pentatonic roles into two functional pairs: Do/Sol on the ~Fa side, and La/Mi on the Fa side — the emergent source of the major/minor bifurcation.
5. The Tonal Identity Matrix
The listener's tonal orientation at any moment is represented as a 12x12 probability matrix M, where:
M[n][r] = probability that actor Tn is currently playing role r
Each row of the matrix represents one actor's current probability distribution over all twelve roles. Each column represents the current probability distribution over which actor is playing that role.
5.1 The Bell Curve Structure
Each actor's row is a bell curve — a smooth probability distribution peaked at the actor's most likely current role and falling to zero at the antipodal role. The curve is defined on the circle of roles, with the antipodal role receiving exactly zero probability.
This zero-at-antipode constraint is structural, not statistical. If an actor is currently playing Do, it cannot simultaneously be playing ~Do. The antipodal roles are mutually exclusive by definition — they are each other's exact negations.
The shape of the bell curve encodes the listener's degree of tonal orientation. A narrow, sharply peaked curve represents high confidence — the listener is well oriented. A broad, flat curve represents low confidence — the listener is uncertain which role the actor is playing. The bell curve is the mathematical representation of a filled jar: the concentration of probability around a role reflects the accumulated tonal experience that has associated that actor with that role.
5.2 The Three Canonical States
Maximum Orientation
The maximally oriented state is a permutation matrix. Each actor plays exactly one role with certainty. Every row contains a single 1 and eleven 0s. Every column contains a single 1 and eleven 0s. There are exactly 12 such matrices — one for each possible keycenter assignment. This is the state of complete tonal clarity: the listener knows exactly which actor is playing which role.
Maximum Disorientation
The whole tone scale sustained for an extended period produces the state of maximum entropy. All six tritone pairs are simultaneously present, none dominant. The six present actors each have equal probability across all twelve roles. The six absent actors have zero probability everywhere. The listener knows which actors are on stage but has no information about their roles. This is the shuffled deck — maximum ambiguity, zero orientation.
Partial Orientation
The general state between these extremes is where tonal music lives. The probability distribution is neither uniform nor collapsed. It is concentrated but not certain — peaked around a most likely orientation while retaining nonzero probability for neighboring orientations. The listener has a sense of where they are but remains sensitive to new evidence.
5.3 Symmetry Properties
The matrix has a precise symmetry structure. In any fully oriented state, the 12 roles arrange around the circle of fifths:
... Fa — Do — Sol — Re — La — Mi — ~Fa ...
Re sits at the structural center, equidistant from both fulcrum roles. The matrix is bilaterally symmetric about Re in every fully oriented state. This symmetry is invariant — it holds regardless of which actor is cast as Do.
6. The Bayesian Update Rule
When a new tone Tx arrives, the matrix updates according to Bayes' theorem. The prior distribution M[n][r] is multiplied by the likelihood of hearing Tx given that Tn is playing role r, then renormalized:
M_new[n][r] ∝ M_old[n][r] × L(Tx | Tn plays role r)
The likelihood function L encodes how much each role confirms or disconfirms the current orientation. It is drawn directly from the 5+5+2 partition:
Pentatonic roles (Do, Sol, Re, La, Mi): L is positive — the tone confirms the current orientation
Anti-pentatonic roles (~Do, ~Sol, ~Re, ~La, ~Mi): L is negative — the tone disconfirms the current orientation and supports the rival
Fulcrum roles (Fa, ~Fa): L is zero — the tone provides no discriminating information between competing orientations
The five positive likelihood values are not equal. Their relative magnitudes are determined by the structural position of each pentatonic role relative to the Fa/~Fa axis, as described in Section 7.
6.1 Convergence
The update rule converges correctly under repeated exposure to pentatonic tones of a given orientation. Each pentatonic tone multiplies the probability of the correct orientation by a factor greater than 1, while multiplying competing orientations by lesser factors. Over accumulated exposure the distribution collapses toward the correct permutation matrix.
Anti-pentatonic tones drive the distribution toward the rival orientation. Accumulated anti-pentatonic evidence produces the phenomenon described in Section 8 as Recentric Modulation — a continuous warping of the probability distribution toward a new keycenter.
6.2 The Role of Fa and ~Fa in Updates
Because Fa and ~Fa have zero likelihood differential, hearing either fulcrum tone does not update the relative probabilities of competing orientations. This is the mathematical expression of the coin flip. The fulcrum tones are maximally ambiguous — not because they carry no information, but because they carry equal information for both competing orientations simultaneously.
However, Fa and ~Fa play a crucial role in a different sense: their presence in the accumulated tone set is what makes orientation possible at all. A pentatonic set without either fulcrum tone has three possible keycenters. Adding either fulcrum tone collapses the ambiguity to one. This is the key discriminating function described in the Bayesian collapse hierarchy.
7. The Likelihood Function and Amplitude Ordering
The five positive likelihood values — one for each pentatonic role — are not free parameters. Their relative magnitudes are determined by the structural geometry of the role circle.
7.1 Structural Distance from the Fulcrum Axis
In any fully oriented state, the pentatonic roles occupy specific positions relative to the Fa/~Fa axis in the circle of fifths:
~Fa — Mi — La — Re — Sol — Do — Fa
Re sits at the center — equidistant (3 steps) from both fulcrum poles. Do is 1 step from Fa. Mi is 1 step from ~Fa. Sol is 2 steps from Fa. La is 2 steps from ~Fa.
The structural symmetry across Re produces the following equivalences:
L(Do) = L(Mi): both are 1 step from a fulcrum pole, on opposite sides of Re
L(Sol) = L(La): both are 2 steps from a fulcrum pole, on opposite sides of Re
L(Re) is unique: the sole pentatonic role at the structural center
This reduces the five seemingly independent likelihood values to three distinct values, with the ordering:
L(Re) < L(Sol) = L(La) < L(Do) = L(Mi)
This ordering is not imposed from outside the model. It emerges from the geometry of the role circle. Re, as the tone of maximum structural balance, is the least discriminating pentatonic tone. Do and Mi, as the tones immediately adjacent to the fulcrum axis, are the most discriminating.
7.2 Antisymmetry
The full likelihood vector is antisymmetric. For every pentatonic role r with likelihood value L(r), the antipodal anti-pentatonic role ~r has likelihood value -L(r). The complete ordered set is:
+L(Do), +L(Sol), +L(Re), +L(La), +L(Mi), 0, 0, -L(Mi), -L(La), -L(Re), -L(Sol), -L(Do)
This antisymmetry ensures that the model treats both orientations with perfect symmetry. Confirming the current orientation is exactly as informative as disconfirming it. The coin has two equally weighted sides.
8. Modulation as Interference
Modulation — the shift from one tonal orientation to another — is not a discrete event in this model. It is a continuous process of probability redistribution driven by accumulated tonal evidence.
8.1 The Interference Picture
At any moment, the listener's tonal state can be represented as a superposition of two matrices: the current orientation matrix and the rival orientation matrix, weighted by their respective probabilities. The rivalry between two orientations is literally interference between two shifted versions of the same matrix structure.
The phase shift between the two matrices determines the modulation distance. A shift of π corresponds to the maximum possible modulation distance — the tritone relationship, the coin flip. A shift of π/6 corresponds to the minimum modulation distance — adjacent keycenters on the circle of fifths. Intermediate shifts produce intermediate degrees of interference.
8.2 Recentric Modulation
When anti-pentatonic tones accumulate in a tonal context, the probability distribution warps continuously in the direction of the anti-pentatonic evidence. The current keycenter loses probability mass. The rival keycenter gains it. If sufficient anti-pentatonic evidence accumulates, the distribution recenters around a new keycenter — the matrix undergoes what we call Recentric Modulation.
This process is not experienced by the listener as a sudden jump. It is experienced as a gradual shift in tonal gravity — the sense that home has moved, that what was tense is now at rest, that the drama has entered a new scene with a new emotional center. The involuntary nature of this reorientation is one of the most powerful experiences in tonal music.
The distance of modulation — how far the new keycenter is from the old one on the circle of fifths — determines the degree of perceptual disruption. Small modulations (adjacent keycenters) produce smooth, barely-perceptible shifts. Large modulations (tritone distance) produce the maximum perceptual rupture — the moment of complete reorientation.
8.3 Re as the Invariant
Throughout all modulations, Re maintains its special structural status. As the mirror axis of bilateral symmetry, Re and ~Re are invariants of the system — they maintain zero differential confidence regardless of orientation. This invariance means that Re functions as a structural anchor even during the most disorienting modulations. It is the balance point around which the entire tonal field rotates.
9. Pedagogical Implications
This model has direct implications for how tonal fluency is most effectively developed. The implications run counter to most conventional approaches to ear training and music theory instruction.
9.1 The Jars Are Already Filled
Any adult who has spent years listening to music has already accumulated the tonal experience needed for fluency. The pentatonic roles are already associated with familiar tones in the listener's memory. The anti-pentatonic tensions are already felt, if not consciously recognized. The Bayesian update machinery is already running.
What the listener lacks is not experience but orientation — the ability to consciously navigate the tonal space they already inhabit. Teaching tonal fluency is not a matter of filling empty jars. It is a matter of helping the student find the cabinet where the jars are stored and learn its layout.
9.2 Theory Must Follow Perception
The model is not a theoretical framework to be learned and then applied to music. It is a description of a perceptual structure that already exists in any experienced listener. Teaching the theory before the student has conscious perceptual access to the structure it describes is counterproductive — it substitutes a map for the territory, a description for the experience.
The pedagogical sequence must be: perception first, theory second. The student must first feel the gravitational pulls, recognize the pentatonic confidence, experience the anti-pentatonic tension, feel the coin flip. Only after these perceptions are conscious does the theoretical vocabulary become useful — as a way of talking about what the student already hears, not as a way of telling them what to listen for.
9.3 The Mother Tongue Parallel
Language acquisition provides the correct developmental model. A child does not learn their native language by studying grammar. They are immersed in meaningful patterns, develop intuitive competence through exposure and use, and arrive at analytical understanding — if at all — long after fluency is established.
Tonal fluency develops most naturally through an analogous process: immersive exposure to tonally meaningful musical patterns, active audiation of tonal roles and relationships, and the gradual development of real-time orientation ability. The analytical framework described in this document is the grammar of tonal hearing — useful for communicating about and refining tonal skill, but never a substitute for the direct perceptual competence it describes.
9.4 The Switch
The most important moment in developing a student's tonal fluency is what we call the flip of the switch — the moment the student realizes that all the music they have loved has been operating on a semantic level they have never consciously accessed. This realization is not deflating. Framed correctly, it is the discovery of a new world within the music they already love — like suddenly being able to read a book they have previously only heard read aloud in a language they didn't speak.
After the switch flips, students consistently report two things: first, that they cannot unhear the tonal structure — it becomes permanently and involuntarily audible; second, that their appreciation of musics previously felt as foreign or inaccessible — jazz, classical, non-Western traditions — expands dramatically. Tonal fluency turns out to be a human capacity, not a culturally specific one.
10. Open Questions
This model is presented as version 015 of an ongoing theoretical development. Several questions remain open and are offered here as directions for future work.
The precise shape of the bell curve: The model specifies that each actor's probability distribution reaches zero at the antipodal role, but the exact functional form of the curve has not been derived from first principles. The raised cosine (1 + cos(θ))/2 is a natural candidate but requires further justification.
The absolute magnitudes of L(Do), L(Sol), and L(Re): The model establishes their ordering and the equivalences L(Do)=L(Mi) and L(Sol)=L(La), but the actual numerical ratios between the three distinct values are not yet derived. It is possible these ratios are free parameters representing individual differences in accumulated immersion, or possible that the model's internal structure constrains them further.
The update rule dynamics: The Bayesian update rule as stated operates on individual tones. Real music presents simultaneous tones, rhythmic patterns, timbral information, and temporal structure. How these additional dimensions interact with the tonal update mechanism is not yet formalized.
Empirical testing: The model makes testable predictions — about reaction times, modulation perception, EEG signatures of tonal collapse, and implicit sensitivity in non-musicians. These predictions are inherited from the cognitive science literature on tonal perception but have not yet been tested specifically against this model's formulation.
Non-12TET systems: The model is explicitly scoped to 12TET. Its extension to other tuning systems — just intonation, microtonal systems, non-Western tuning traditions — is an open question with significant implications for the universality claims embedded in the pedagogical framework.
Acknowledgments
The tonal cognition model, its pedagogical applications, and all musical observations are the intellectual work of Brian Hunker, developed through 20+ years of one-on-one teaching practice with thousands of students. The mathematical formalization and structural analysis were developed collaboratively with Claude (Anthropic) across multiple working sessions in February 2026. This work is dedicated to all the students whose honest struggles and breakthroughs in learning to hear tonally have continuously refined these ideas.
Special acknowledgment to Edwin Gordon, whose foundational work on audiation and music learning theory — still vastly underappreciated in the field — provided essential grounding for the pedagogical framework developed here.