Inharmonicity enhances brain signals of attentional capture and auditory stream segregation

Introduction

Harmonic sounds consist of acoustic waves containing frequencies that are integer multiples of a common fundamental (F0). Many ecologically salient sounds, such as human voices, animal calls or music, are highly harmonic. Conversely, if the acoustical wave consists of frequencies that are not integer multiples of a common fundamental, it is said to be inharmonic. These sounds are often described as noises, sizzles, pops or rattles1. Harmonicity has been suggested to underlie auditory scene analysis2 and help separate the source of relevant sounds from background noise3, forming a basic organizational principle of the auditory cortex4,5,6. In language, violations of harmonicity impair the ability to understand concurrent speech and cause listeners to lose track of target talkers7. In music, inharmonic sounds can impair pitch-related abilities such as interval detection8. Similarly, inharmonicity disrupts the ability to compare pitch across time delays, which suggests a role in memory encoding9.

Harmonicity can be understood in the light of predictive processing theories, which posit that perception relies on the brain forming top-down predictions about the incoming stimuli and their causes10,11,12,13,14,15,16,17. These predictive processing accounts assume that predictions stem from an internal, generative model that is constantly updated by statistical regularities in the incoming sensory information. An “error” or “surprise” response (i.e., prediction error) occurs when predictions do not fit the sensory input, which can be used to update the contents of the generative model12,18. According to this theory, prediction errors are weighted by uncertainty (or precision) of the context, adjusting the relevance given to sensory inputs, allowing perceptual inference under uncertainty19,20,21. Therefore, if a signal is imprecise or unreliable (as in, for example, seeing in dark conditions or hearing in noisy environments), any prediction errors arising from it would be down-weighted and less likely to influence future predictions. While precision-weighting in the auditory domain has been explored empirically (see ref. 13 for a comprehensive review), the details of this process remain unclear.

Here, we hypothesize that harmonicity might be one of the relevant auditory features (among others, such as duration or intensity) that drive precision-weighting of prediction errors. Harmonic sounds have lower information content (lower entropy22) than inharmonic sounds, as the spectrum of an ideally harmonic sound will consist only of integer multiples of F0. Thus, many aspects of the sound can be reliably described using only one piece of information: the F0. Conversely, inharmonic sounds have higher information content (higher entropy). Consequently, any prediction errors produced by inharmonic sounds should have lower precision and would therefore be less likely to influence future expectations23. In the case of inharmonic sounds, precision-weighting might affect prediction errors not only by weakening pitch percepts but, more generally, by down-weighting sensory evidence about the spectral content of sounds.

Cortical responses associated with prediction errors can be quantified in event-related potentials (ERPs) using electroencephalography (EEG). Mismatch negativity (MMN) is a widely studied response to a deviation in an otherwise repetitive train of sensory stimuli24. It is a well-established electrophysiological trace of neural activity associated with precision-weighted prediction errors in the cortical auditory system12,25,26,27,28. Another neural response relevant to predictive coding is the P3 component, elicited when individuals shift attention (i.e., P3a) or consciously detect deviant stimuli (P3b29). In this study, we hypothesize that if harmonic sounds index precision as proposed by predictive processing theories, they should produce larger MMN and P3 responses to pitch deviants in comparison to inharmonic sounds.

Consistent with this idea, we previously showed that MMN responses were more prominent in harmonic piano tones than inharmonic hi-hat cymbal sounds30. However, due to the naturalistic nature of the stimuli, we could not entirely rule out if other factors, such as the presence of a stable pitch percept, rise and decay of the sound envelope or other spectral differences, played a role in the modulation of mismatch responses. To address this issue, here we present an EEG experiment using a passive roving oddball paradigm that carefully controlled the harmonic structure of sounds to isolate and measure the effect of harmonicity on auditory prediction errors. We recorded mismatch responses to sounds in three conditions: harmonic, inharmonic and changing. In the harmonic condition, sounds were regular harmonic complex tones. In the inharmonic condition, we introduced inharmonicity by jittering the frequencies above F0 and applied the same pattern of jitters to all tones in the sequence, thereby increasing the spectral uncertainty of the sounds. Note that the inharmonic condition introduces the same kind of uncertainty to the spectrum of each sound. However, MMN responses are also affected by the sequential uncertainty of the sound stream (i.e, what sound follows next). To investigate whether these two types of uncertainty interact, we introduced a changing condition in which a different jitter pattern was assigned to each individual sound. This increased sequential uncertainty and made predictions of subsequent sounds harder. We hypothesize here that the amplitudes of MMN and P3a would be affected by the spectral and sequential uncertainty of the tones. This would be shown as reduced prediction errors in inharmonic sounds, especially when the spectral content is repetitively jittered in the sequence.

Results

Paradigm outline

We used a version of the roving oddball paradigm to generate mismatch responses31,32. In a typical roving paradigm, several sounds are presented at a specific frequency, followed by a set of sounds at a different, randomly chosen frequency, which in turn is followed by another set, and so on. The sound immediately after the frequency shift is the deviant sound, but it eventually becomes the standard after a few repetitions. The participant listens to the sounds passively while watching a silent movie. In this study, instead of pure tones, we used (in)harmonic complex tones with the roving reflected in changes to the fundamental frequency of the complex. We used the roving paradigm because it ensures that the standards and deviants have the same physical properties (i.e., deviance is induced by the context of the sequence). It also allows for systematic investigation of the effect of the amount of frequency change on the mismatch response by taking advantage of the random changes to frequencies. Finally, the roving paradigm allows for investigating if mismatch responses are sustained after the first deviant in the sequence by analyzing the second and third-order deviants (shades of red in Fig. 1A, D, G).

Fig. 1: Paradigm outline and stimulus characteristics.

A provides a symbolic representation of a roving sequence in the harmonic condition. The fundamental frequency is shown in blue or red, while the upper harmonics are shown in gray. Shades of red represent theoretical deviants, which progressively become standards (blue). B presents the waveform and C presents the spectrum of an example harmonic sound (F0 = 500 Hz). Similarly, D–F show sequence representation, waveform and spectrum for the inharmonic condition. Notice that while the distribution of the partials becomes uneven in the inharmonic condition, the pattern of jittering is carried through from one fundamental frequency to the next. Conversely, G shows the sequence representation for the changing condition, where a different jittering pattern is applied to each sound. H shows entropy calculations for harmonic (dashes) and inharmonic (violin plot distributions) sounds for each F0. Note that while we calculated entropies for 1000 different inharmonic sounds present in our sound pool, there was only one harmonic sound for each frequency (thus producing a single entropy value instead of a distribution).

In order to investigate the effect of harmonicity on mismatch responses, we presented sounds in three conditions. In the harmonic condition, all sounds were harmonic complex tones (consisting of F0 and its integer multiples, Fig. 1A–C. In the inharmonic condition, we introduced a random jittering pattern to each frequency above the F0 (Fig. 1D–F). Here, the same jittering pattern was applied to each sound within a set (Fig. 1D). In the changing condition, a different jittering pattern was applied to each sound. Crucially, in each of the conditions, the F0 remained unchanged by the jittering. We assumed that keeping the lowest frequency constant would induce the perception of a fundamental (F0) pitch strong enough to form a predictive model based on this percept. Approximate entropy, a measure of the amount of information contained in a time series, was on average lower for harmonic (M = 0.02, SD = 0.01) than for inharmonic sounds (M = 0.19, SD = 0.01), and this difference was consistent irrespective of the F0 (Fig. 1H).

Presence of MMN and P3

First, we investigated if the three conditions produced significant mismatch responses by testing the differences between standard and deviant responses using a mass-univariate approach. This allows for comparing ERPs without assumptions about spatio-temporal “windows” of activity33. We used cluster-based permutations, a non-parametric technique that tests for differences between conditions while controlling for multiple comparisons34. In the harmonic condition (Fig. 2A), we found differences between standards and deviants corresponding to a cluster at 87–219 ms (26/30 sensors, p = 0.0002). Topographically, this mismatch response started as a frontocentral negativity at 120–180 ms, typical of an MMN. A cluster at 216–286 ms, reflecting a frontocentral positivity typical of a P3a response, was marginally significant (16/30 sensors, *p *= 0.076). We also found an additional negative cluster at 329–450 ms (21/30 sensors, p = 0.002). In the inharmonic condition (Fig. 2B), we found differences between conditions corresponding to clusters in the data at 95–199 ms (23/30 sensors, p = 0.002) as well as at 198–359 ms (23/30 sensors, p = 0.0001). These overlapping clusters formed around typical latencies for MMN and P3a and had typical topographies. However, we found no differences between standards and deviants in the changing condition (p > 0.05 for all detected clusters), indicating that this condition did not produce any clear MMN or P3a responses (Fig. 2C and Table S1).

Fig. 2: Event-related potentials and their topographies.

Panels show responses in harmonic (A), inharmonic (B), and changing (C) conditions. Traces on the left show grand-average responses to standards (blue) and deviants (red), and the difference waves (green) for frontocentral channels (F3, Fz, F4, FC1, FC2). Color-shaded areas represent 95% confidence intervals, and gray shades represent significant clusters in the mass-univariate analysis. Scalp topographies present the difference wave signal strengths at chosen latencies.

Mass-univariate analysis of group differences

To investigate the differences between the three conditions, we used a cluster-based permutations approach, this time with an F-test. The test revealed differences between the conditions corresponding to clusters at 72–193 ms (25/30 sensors, p = 0.016) and at 211–345 ms (21/30 sensors, p = 0.008), indicating a main effect of condition (Fig. 3A). Next, we ran post-hoc pairwise comparisons between conditions using cluster-based t-tests (Fig. 3B). To account for multiple comparisons, we assumed a Bonferroni-corrected alpha level of p = 0.05/3 = 0.0166. We found significant differences between harmonic and inharmonic conditions in the P3a latency range (190-353 ms, a cluster comprising 22 sensors, p = 0.010) but not in the MMN range. Additionally, we found differences between harmonic and changing conditions for both MMN and P3a latency ranges (77–254 ms, 24/30 sensors, p = 0.002). A later cluster (363-450 ms) was not significant after multiple comparisons correction (20/30 sensors, p = 0.033). Finally, we found differences between inharmonic and changing conditions in the P3a range (200–358 ms, 24/30 sensors, p = 0.001). An earlier cluster (98–198 ms) corresponding to the MMN latencies was marginally significant (21/30 sensors, p = 0.025).

Fig. 3: Event-related potentials in the harmonic, inharmonic and changing conditions.

Traces show grand-average mismatch waves (deviant—standard, corresponding to green mismatch traces in Fig. 2) for the frontal channel activity (electrodes Fz, F3, F4, FC1, FC2). A compares responses between the three conditions, while B–D show post-hoc pairwise comparisons. Colored bands around the traces represent 95% confidence intervals around the mean. Gray rectangles indicate statistically significant clusters in the mass-univariate analysis. Topographies show F-maps (A) or t-maps for post-hoc comparisons (B–D) for each significant cluster, with the temporal extent of the clusters indicated below each topography. White markers represent channels that comprise the cluster.

Mean amplitude and peak latency

Next, in order to investigate the effect of harmonicity on the amplitudes and latencies of mismatch responses, we calculated participant-wise mean amplitudes and peak latencies in the latency ranges for both MMN and P3a (Fig. 4). For each of these measures we constructed linear mixed models with intercept and condition (harmonic, inharmonic, changing) as fixed effects and a random intercept for each participant (m1). We compared these models against null models that contained only the fixed intercept and a random intercept for each participant (m0).

Fig. 4: Peak measures differences in harmonic, inharmonic and changing conditions.

Panels show MMN mean amplitude (A), P3a mean amplitude (B), MMN peak latency (C), and P3a peak latency (D). Violin plots show distributions of obtained results. White dots represent the median, narrow rectangles represent quartile 2 and 3 ranges, and vertical lines represent minima and maxima. Gray lines connect observations from the same participant. Stars (***) and red bars represent statistically significant post-hoc comparisons (all p-values < 0.001).

For MMN mean amplitudes, m1 performed significantly better than m0 (AICm0 = 336.4, AICm1 = 325.7, Chi2(2) = 14.71, p = 0.0006). Post-hoc comparisons of estimated marginal means revealed a significant difference between harmonic and changing conditions (contrast estimate = −1.03, SE = 0.27, t(67.8) = −3.84, p = 0.0008). Similarly, there was a significant difference between inharmonic and changing conditions (contrast estimate = −0.70, SE = 0.27, t(67.8) = −2.57, p = 0.033). The harmonic–inharmonic contrast was not statistically significant (contrast estimate = −0.34, SE = 0.27, t(67.8) = −1.24, p = 0.43). Taken together, these results suggest that there are no substantial differences in the MMN amplitude between harmonic and inharmonic sounds. However, the MMN is stronger (more negative) for both harmonic and inharmonic than for changing sounds.

For P3a mean amplitudes, m1 also performed significantly better than m0 (AICm0 = 458.8, AICm1 = 444.0, Chi2(2) = 18.80, p < 0.0001). Post-hoc comparisons revealed significant differences between harmonic and inharmonic conditions (contrast estimate = -1.37, SE = 0.36, t(68) = -3.85, p = 0.0007) as well as between inharmonic and changing conditions (contrast estimate = 1.45, SE = 0.36, t(68) = 4.07, p = 0.0003). No significant differences were found for the harmonic-changing contrast (contrast estimate = 0.08, SE = 0.36, t(68) = 0.23, p = 0.97). These results suggest a significant effect of harmonicity on P3a amplitude, such that the P3a for inharmonic sounds is greater than for both harmonic and changing sounds.

For MMN peak latency measures, m1 did not perform significantly better than m0 (AICm0 = −390.4, AICm1 = −388.0, Chi2(2) = 1.59, p = 0.45). The same was also the case for P3 peak latency (AICm0 = −331.7, AICm1 = −330.2 Chi2(2) = 2.49, p = 0.29). This indicates that harmonicity does not substantially influence the latency of MMN and P3a mismatch responses.

Object-related negativity

Beyond mismatch responses, the N1-P2 complex patterns in the inharmonic and changing condition differed noticeably from the harmonic condition (Fig. 2). A possible candidate for this type of response is the object-related negativity (ORN), an ERP component associated with a separation of concurrently presented sounds into distinct auditory objects35,36,37. Here, we investigated the possibility that inharmonic sounds gave rise to the ORN by contrasting the responses to harmonic vs. inharmonic and harmonic vs. changing sounds, for both standards and deviants. To this end, we performed cluster-based permutations with a t-test of differences between conditions (Fig. 5 and Table 1). We found differences corresponding to negative clusters in the data at latencies consistent with the ORN in harmonic vs. inharmonic contrasts (both for standards and for deviants) and in harmonic vs. changing contrasts (only for standards) (see Table S2 for a complete list of clusters detected in this analysis).

Fig. 5: Event-related potentials comparison for object-related negativity (ORN) analysis.

Traces show grand-average responses for harmonic, inharmonic and changing standards or deviants. Green trace shows the difference wave corresponding to the ORN. A compares harmonic and inharmonic standards, B compares harmonic and inharmonic deviants, C compares harmonic and changing standards, while D compares harmonic and changing deviants. Traces represent averaged frontal channel activity (electrodes Fz, F3, F4, FC1, FC2). Colored bands around the traces represent 95% confidence intervals around the mean. Gray rectangles indicate statistically significant clusters in the mass-univariate analysis. Topographies show t-maps for each significant cluster. White markers represent channels that comprise the cluster.

To investigate the relationship between the ORN and the deviance detection processes related to MMN and P3a, we extracted the P2 mean amplitudes of both standards and deviants in each condition, averaged in a window around a global grand-average P2 peak latency (141 ms). We constructed a linear mixed model to predict P2 mean amplitude with condition, deviance (standard/deviant) and condition-by-deviance interaction as fixed effects and participant-wise random intercepts (m2). We compared this model with models that contained only an intercept as a predictor (m0) and an intercept and condition (m1) as predictors, besides the participant-wise random intercepts. The model m2 performed significantly better than both m0 (Chi2(2) = 81.28, p < 0.0001) and m1 (Chi2(2) = 41.76, p < 0.0001); AICm0 = 707.1, AICm1 = 671.6, AICm2 = 635.8. ANOVA on m2 revealed significant effects of condition (F(2,170) = 31.38, p < 0.0001), deviance (F(1,170) = 37.15, p < 0.0001) and a significant condition-by-deviance interaction (F(2,170) = 7.38, p = 0.001). Post-hoc contrasts revealed that P2 amplitudes were significantly higher for standards than for deviants in the harmonic (contrast estimate = −1.12, SE = 0.19, t(170) = −5.79, p < 0.0001) and inharmonic (contrast estimate = -.83, SE = 0.19, t(170) = −4.262, p = 0.0005) conditions, but not in the changing condition (contrast estimate = −0.01, SE = 0.19, t(170) = −0.51, p = 0.99) (Table S5).

Taken together, these results suggest the presence of a frontocentral negativity in the harmonic vs. inharmonic/changing difference waves within a latency range of 100–200 ms (Fig. 5). This pattern was found for all studied comparisons except for the harmonic vs. changing deviants, likely due to the absence of a mismatch response in the latter (Fig. 5D). Note that the inharmonic deviant had a more negative amplitude than the harmonic standard (contrast estimate = −2.06, SE = 0.19, t(170) = −10.61, p < 0.0001, see Table 1), suggesting the presence of inharmonicity effects in the absence of deviance effects.

Behavioral experiment

To investigate if the observed responses were related to perceiving multiple auditory objects in both inharmonic conditions, we ran a follow-up behavioral study. The participants were asked to listen to short sequences of either harmonic, inharmonic or changing sounds and judge if they heard one, two or three or more sounds at once. Results revealed that listeners were over sixteen times more likely to judge sounds as consisting of many different objects in the inharmonic condition (OR = 16.44, Est. = 2.80, SE = 0.57, p < 0.0001) and over 62 times more likely in the changing condition (OR = 62.80, Est. = 4.14, SE = 0.64, p < .0001) in comparison to the harmonic condition (Fig. 6A).

Fig. 6: Behavioral experiment results.

A shows the number of responses of each category between the three conditions. B shows the mean errors in the three conditions. These errors were calculated as the absolute value of the difference between participant-reported and ground truth number of deviants (lower values = more correct answers). Ground truth values ranged from 1 to 5 (M = 2.8, SD = 0.72). Dots represent single observations. Horizontal lines represent the median, boxes represent quartile 2 and 3 ranges, whiskers represent minima and maxima.

In the behavioral experiment, we also took the opportunity to examine if the participants were able to consciously perceive the F0 deviants in the changing condition. To this end, we exposed the participants to sequences of 20 sounds with roving F0 (following the same rules as in the EEG experiment) and asked them to count the number of deviants. For each sequence, we calculated a counting error metric. We constructed a linear mixed model to predict these errors in this task with intercept and condition as fixed effects and per-participant random intercepts (m1). This model performed significantly better than a null model (m0) that contained only intercept as predictor and per-participant random effect (Chi2(2) = 69.12, p < .0001); AICm0 = 1203.3, AICm1 = 1138.2. Estimated marginal means for the absolute errors in the harmonic (M = 0.31, SE = 0.40) and inharmonic (M = 0.62, SE = 0.41) condition were not significantly different (t(206) = −0.630, p = 0.80). However, the absolute error in the changing condition (M = 4.25, SE = 0.41) was significantly higher than in both harmonic (t(206) = 8.08, p < .0001) and inharmonic (t(206) = 7.42, p < .0001) conditions. There was no significant difference between harmonic and inharmonic conditions. These results suggest that while participants were able to generate a predictive model and consciously perceive deviations from it in both harmonic and inharmonic conditions, this may not have been the case in the changing condition, where a predictive model did not seem to be formed, and many sounds were considered deviants.

Second- and third-order deviants

Next, we took advantage of the roving paradigm to examine whether the effects of harmonicity continue for sounds after the first deviant. To this end, we analyzed mismatch responses to second and third sounds after the frequency change (the second- and third-order deviants in each “rove”). We applied the same logic as in the main analysis, first using a mass-univariate approach and then focusing on peak measures. In the mass-univariate comparison of standard and deviant responses, we only found significant differences for the second-order deviant in the inharmonic condition at the latency range of P3a at 215–450 ms (19/30 sensors, p = 0.0042), but nothing at the latency range of MMN. No statistically significant differences were found in the harmonic and changing conditions. Similarly, no significant differences appeared for the third-order deviant in any condition (p > 0.05 for all clusters). This indicates that the P3a response for inharmonic sounds continued to be detectable in the second-order (but not third-order) deviants, while there was no detectable MMN for these extra deviants. However, when we compared the three conditions using a cluster-based permutations F-test, we found no significant differences for second- as well as third-order deviants (p > 0.05 for all clusters, Table S3).

We investigated this result further using linear mixed modeling for mean amplitude and peak latency, with participant-wise random intercepts. For the second-order deviant, we found that the model m1, which predicted P3a mean amplitude with condition, performed significantly better than the null model m0 (AICm0 = 344.6, AICm1 = 340.4, Chi2(2) = 8.21, p = 0.016). Post-hoc comparisons revealed significant differences between harmonic and changing (contrast estimate = 0.67, SE = 0.26, t(66) = 2.55, p = 0.035), as well as inharmonic and changing conditions (contrast estimate = 0.081, SE = 0.26, t(66) = 3.07, p = 0.009). No significant differences were found for the harmonic–inharmonic contrast. In the case of third-order deviants, the condition did not improve model performance for any of the studied peak measures (p > 0.05 for all model comparisons). These results suggest that the P3a response gets carried over to the second-order deviant and is stronger for both harmonic and inharmonic sounds than for changing sounds.

Frequency shifts

Finally, we investigated whether the effects that have been observed thus far are moderated by the amount of change (deviance) of the F0. In the roving paradigm, the F0 changes randomly in 50 Hz increments from 50 Hz to 300 Hz, both up and down. We extracted the amount of F0 change associated with each first-order deviant and entered it as a fixed effect into the linear mixed models that were analyzed previously. Models containing the main effects of condition and frequency shift (m2) performed significantly better than models that included just the condition (m1) for MMN mean amplitude (AICm1 = 3470, AICm2 = 3467, Chi2(2) = 5.15, p = 0.023) and P3 mean amplitude (AICm1 = 3712, AICm2 = 3703, Chi2(2) = 11.33, p = 0.001). However, models containing the interaction of condition and frequency shift (m3) did not perform significantly better than m2 for all studied measures (all p-values > .05, see Supplementary Table 4 and Supplementary Fig. 2). These results indicate that the effects of harmonicity established in the previous analyses are not moderated by the amount of shift in the fundamental frequency.

Discussion

In this study, we showed that harmonicity influences the brain’s mismatch responses to both expected and unexpected sounds. Contrary to our hypothesis, inharmonic sounds with a constant jittering pattern (the inharmonic condition) generate MMN responses comparable to those of harmonic sounds, and elicit P3a responses that are stronger than in the other two conditions (despite the passive listening nature of the task). In contrast, MMN responses become undetectable when the jittering changes between sounds (the changing condition), suggesting that sequential, but not spectral uncertainty, induces the precision-weighting effect. Interestingly, the ERPs to both standards and deviants differed between harmonic and inharmonic sounds, suggesting that inharmonicity elicits an ORN response. This result can be further explained by behavioral data, suggesting that for inharmonic sounds, listeners are more likely to perceive more than one auditory object at the same time. Overall, our results suggest that inharmonicity does not act as a source of uncertainty as conceived by classic predictive processing theories. Instead, inharmonicity seems to induce the segregation of the auditory scene into different streams, capturing attention (as reflected in the P3a) and giving rise to specific neural processes that are independent from the predictive mechanisms underlying sequential deviance detection and the MMN.

The MMN in the inharmonic condition did not differ significantly from the harmonic condition. This result is not consistent with the precision-weighting hypothesis, as inharmonic sounds carry more information and should theoretically yield predictions of lower precision38. Instead, it suggests that the MMN is sensitive to the uncertainty present in the sequence of consecutive stimuli. This is evidenced by the fact that the MMN was undetectable in the changing condition, where spectral uncertainty introduced by inharmonicity was coupled with sequential uncertainty introduced by random jittering of consecutive sounds. In the inharmonic condition, all partials in the complex tone changed with every deviation; however, the relationship between the frequencies remained the same. This would mean that the MMN is sensitive to more global (context-dependent) uncertainty and not to the uncertainty generated by the introduction of inharmonicity.

This result relates to a larger issue of how precision-weighting is related to MMN, an ERP that is thought to reflect prediction error responses25 (Garrido et al., 2009). In general, precision-modulated evoked responses to unexpected stimuli could theoretically exhibit both larger and smaller amplitudes. A smaller ERP amplitude could result from a smaller predicted difference between the standard and a deviant (smaller prediction error). However, it could also result from a larger predicted difference that is down-weighted by precision13,39. Recent simulation work has shown that these two cases could in practice be disassociated, because any change to precision-weighting would necessarily be accompanied by a change in the latency of the ERP peaks40. We have not found any evidence of latency effects on any of the studied components.

The results of this study show that inharmonic sounds with changing jitter patterns produce weaker mismatch responses than harmonic sounds. This is consistent with the precision-weighting hypothesis, as the changing condition has higher sequential uncertainty, making the inference about the pitch of each consecutive sound more uncertain (however, see P3a, ORN and multiple auditory objects section for an alternative explanation). This is also in line with previous research suggesting a role of harmonicity in precision-weighting for auditory features such as timbre, intensity and location30. However, the stimuli used in that study were sounds of musical instruments (piano and hi-hat) that differed in other acoustical properties apart from harmonicity (e.g., there were substantial differences in the spectra and in the amplitude envelopes). Furthermore, that study could not investigate pitch differences per se, due to the pitch percept being imperceivable for the hi-hat sound. Here, we provide a systematic manipulation of harmonicity using synthetic sounds, ensuring that all other acoustic parameters apart from harmonicity of the stimuli remain the same between the conditions. The robustness of these results is further reinforced by the lack of any interaction between frequency shift and condition, which indicates that the observed effects are not frequency-dependent.

Surprisingly, we found that P3a amplitude was higher for inharmonic than for harmonic sounds. Following the conventional interpretation of the P3a29, this result may indicate that inharmonic sounds capture attention more than harmonic sounds. A possible explanation might be that the auditory system treats inharmonic sounds as multiple, different auditory objects, while it collapses harmonic sounds into a single prediction about the F0. The F0 change associated with the deviant in the inharmonic condition would then require an update to not one but many different predictive models, resulting in overall stronger prediction error responses. In contrast, in the case of the changing condition where sounds constantly shift, the auditory system attempts (and fails) to track changes in too many partials at once, because the only stationary frequency component is the F0. This logic could explain the P3a results in the present study, as well as the absence of both P3a and MMN components in the changing condition.

An unexpected result came from direct comparisons between responses to harmonic and inharmonic sounds (both standard or deviant). The introduction of inharmonicity produced a response pattern characterized by a more negative P2 peak. We interpret this as an ORN, an ERP related to the separation of concurrently presented sounds into distinct auditory objects35,36,37. The ORN is thought to reflect the auditory system performing concurrent sound segregation. Conventionally, it is elicited in paradigms with complex tones that include one mistuned harmonic or a harmonic with an asynchronous onset36,41. Other ORN-eliciting cues include dichotic pitch42, simulated echo43, differences or discontinuities in location44, and onset asynchrony45. Importantly, the ORN is not a mismatch response, i.e., it does not arise in response to a violation of any global rule established with an oddball paradigm. This fact enabled us to evaluate the ORN using standard vs. standard and deviant vs. deviant comparisons. Importantly, the ORN is thought to arise from different neural sources than the MMN46; however, we did not see topographic differences in our data (see Supplementary Fig. 3). We found that it was elicited by both inharmonic standards and inharmonic deviants. One explanation of this result is that the auditory system interprets the inharmonic sound as multiple different sound sources and performs auditory object segregation. This is independent of pitch deviance detection indexed by the MMN. This conclusion is also reinforced by the presence of ORN in the harmonic standard vs. the changing standard comparison. Finally, the P3a results discussed above also support this conclusion.

When synthesizing sounds for this experiment, we used every frequency in the harmonic series up to the Nyquist limit at 24 kHz. This approach was motivated by the need to ensure that the harmonicity manipulation is applied equally throughout the frequency spectrum and that the F0 remains clearly audible throughout the sequence. We rejected frequency-jittering patterns that could produce beating artifacts arising from two harmonics being too close to each other in frequency. Thus, our stimuli were broadband, spanning a large portion of the human hearing range. However, in psychophysical studies on pitch perception, the stimuli are often high-pass or band-pass filtered specifically to reduce the amplitudes of the low frequency harmonics8,47. This procedure makes the pitch perception system focus on the so-called “temporal fine structure” (the high-frequency content) of the sound and does not rely on the resolvability of the lower harmonics[48](https://www.nature.com/articles/s42003-025-08999-5#ref-CR48 “Shackleton, T. M. & Carlyon, R. P. The role of resolved and unresolved harmonics in pitch perception and frequency modulation discrimination. J. Acoust. Soc. Am. 95

Introduction

Introduction

Results

Paradigm outline

Presence of MMN and P3

Mass-univariate analysis of group differences

Mean amplitude and peak latency

Object-related negativity

Behavioral experiment

Second- and third-order deviants

Frequency shifts

Discussion

Similar Posts