Introduction
Learning from appetitive (gain) and aversive (loss) outcomes involves continuously monitoring one’s knowledge about the environment and employing behavioral adjustments whenever the outcomes differ from expectations. It is an essential process for behavioral adaptation to environmental changes1. In humans, the neural representations of both appetitive and aversive stimuli converge on the dorsal anterior cingulate cortex (dACC)[2](#ref-CR2 “Bartra, O., McGuire, J. T. & Kable, J. W. The valuation system: a coordinate-based meta-analysis of B…
Introduction
Learning from appetitive (gain) and aversive (loss) outcomes involves continuously monitoring one’s knowledge about the environment and employing behavioral adjustments whenever the outcomes differ from expectations. It is an essential process for behavioral adaptation to environmental changes1. In humans, the neural representations of both appetitive and aversive stimuli converge on the dorsal anterior cingulate cortex (dACC)2,3,4,5,6,7,8,9 but with distinct neuronal patterns, potentially involving separate neural pathways or coding systems10,11,12,13. The existence of different systems for encoding appetitive and aversive stimuli is further supported by single-neuron recordings from non-human primates, showing that distinct neuronal populations within the ACC respond separately to appetitive and aversive values14,15,16. However, the differential neural mechanisms underlying these distinct populations remain poorly understood.
Animal studies suggest that gamma-aminobutyric acid (GABA) influences learning by regulating synaptic plasticity, network dynamics, and the timing of neuronal activity in various brain regions, including the prefrontal cortex17. In rodents, the connectivity between brain regions that respond to both gain and loss is regulated by the primary inhibitory neurotransmitter GABA18. Another study showed that pharmacologically elevating the levels of GABA in the dACC impairs reward-based learning19. Evidence in human studies indicates that GABA in the dACC is involved in a learning-based decision-making model20,21. One study found that higher baseline dACC GABA was negatively associated with both the use of learned information and reward-related blood-oxygen-level-dependent (BOLD) responses in the dACC20. Another study reported that baseline dACC GABA was positively correlated with learning performance, and negatively correlated with learning rate during a decision-making task21. Together, these findings point to a modulatory role for dACC GABA across multiple stages of learning and choice.
Based on this literature, we hypothesized that GABA would differentially modulate brain activity during appetitive and aversive RL in human participants, and that interactions between GABA concentrations, brain activity, and behavior (learning performance) should be evident in regions that traditionally support learning, such as the dACC, Striatum, Amygdala, and Insula12,22,23,24,25. Direct evidence in humans would require measures of both GABA and brain activity during appetitive and aversive learning tasks. Here, we used magnetic resonance spectroscopy (MRS) at 7 T to measure GABA and glutamate (Glu) simultaneously and non-invasively using an optimized pulse sequence26, and functional magnetic resonance imaging (fMRI) to measure brain activity. 117 participants performed probabilistic learning tasks under gain and loss conditions during MRS and fMRI scanning at a high (7 T) field strength.
Results
Game paradigm and learning behavior
MRS and fMRI were measured during four RL tasks (shown in Fig. 1A). In each task, participants had to select one out of two Chinese letters, after which corresponding feedbacks were presented. The four tasks differed in their win probabilities and their outcome types. In appetitive tasks with a 65–35% gain probability (GP; GP = 65 game), one letter resulted in gaining a small monetary amount with a 65% probability or receiving nothing with a 35% probability, while the other letter resulted in the reverse, namely, winning with a 35% probability. Repeated trials resulted in learning, as participants converged towards the favorable letter. As a control, we added an unlearnable condition, GP = 50, where both letters resulted in a winning probability of 50%. In aversive tasks, gain was replaced with a small monetary loss per trial. Each game contained 50 trials, during which participants learned to choose to maximize their gains and minimize their losses. For more details, see the “Methods” section.
Fig. 1: Behavioral paradigm and scanning protocol.
A Decision-making task. Each trial starts with the presented frames—a waiting screen with a cross in the middle, followed by a two-letter frame where the participant has to choose one letter, followed by another waiting frame (inter-stimulus-interval), and then the participant is presented with the outcome of their choice: 0 or +5 in the gain condition game and 0 or −5 in the loss condition game. B The protocol inside the scanner. The MRS and MRI games were randomized. Half of the participants started with MRI blocks, and half started with MRS blocks. The numbers represent the probability of the outcome. 117 participants were scanned; however, only 105 have useable combined MRS and fMRI data—52 participants in the Loss group and 53 in the Gain group.
The MRS portion of the task was comprised of eight blocks (Fig. 1B) that were collected in a sequence of four games and four rest scans to allow GABA and Glu to return to baseline. The fMRI portion consisted of two blocks (two games) where half of the participants (n = 53; 26 females) played two Gain games (GP = 65-Gain and GP = 50-Gain; Gain group), and the other half (n = 52; 26 females) played two Loss games (GP = 65-Loss and GP = 50-Loss; Loss group). For the learnable GP = 65 games, a learning score was defined as the probability of choosing the correct letter in the last 10 trials (i.e., the letter which maximizes gain). For the unlearnable GP = 50 games, one letter was defined as the reference choice to which the learning score was referring.
Participants demonstrated superior learning performance in the learnable conditions only (Fig. 2A; GP = 65; average learning scores: 0.709 ± 0.002/0.701 ± 0.002 in gain/loss), compared to the unlearnable conditions tested by analysis of variance (ANOVA) (Fig. 2A; PG = 50; 0.498 ± 0.002/0.506 ± 0.002 in gain/loss; ANOVA f{1,106} = 46.8; p-value = 5e−10; Table S1). This shows successful and similar learning performance in both gain and loss conditions.
Fig. 2: Behavioral data and model fitting.
A Learning curves for all four conditions (n = 105). The shaded areas are the standard error of the mean (SEM). B Mean learning scores comparison (n = 105). Error stands for SEM. C mean learning score distribution (n = 105). D Single-subject model fitting example (n = 1). E-I Model parameters are shown for the learnable game conditions (65-gain and 65-loss; n = 105). E Comparison of the calculated learning scores in the original data and in the model (n = 105). F Learning scores comparison between gain and loss conditions (n = 105). G Mean Negative and positive learning rates (Alpha-Neg; Alpha-Pos; n = 105). Error stands for SEM. H Mean decision weight for expected value (Beta/({\beta }_{Q}); n = 105). Error stands for SEM. I Model parameters comparison between gain and loss (n = 105). J Model parameters distribution for all game conditions (n = 105). This figure shows only the MRS games data.
We fitted a classical RL model27 to the individual behavior for the purpose of obtaining the individual, per-trial, value difference coding for our fMRI analysis. The model contains a decision weight (β) parameter and separate parameters for positive (α+) and negative (α−) learning rates (see the “Methods” section for more information). The learning scores were similar across model-fit and actual behavior for both gain and loss games (Fig. 2A–C; Model learning scores: Gain: 0.710 ± 0.001; Loss: 0.677 ± 0.001; Actual learning scores: Gain: 0.709 ± 0.002; Loss: 0.701 ± 0.002). Similarly, at the individual level, the learning scores of actual data and the model were strongly correlated in both gain and loss (Fig. 2E; Pearson correlation p-values: of (7\cdot {10}{-9}) and(,1\cdot {10}{-8}), respectively), confirming that the model captures the variability in individual behavior (Fig. 2D depicts single-subject examples in each game condition). We did not find a significant difference in learning scores between gain and loss in either actual or modeled data (Fig. 2F, two-tailed t-tests). There were no interactions between model parameters (beta value and learning rates) and no interaction and changes across loss and gain (LG) conditions (Fig. 2G–J; repeated measures ANOVA was conducted for each parameter with GP and LG as factors, with no significant effects).
MRS data
A single subject example of the spectrum is presented in Fig. 3A and B, for a game and an in-between-rest spectrum measured in the dACC (Fig. 3C). The mean gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) tissue fractions across subjects were 0.63 ± 0.05, 0.28 ± 0.06, and 0.08 ± 0.03, respectively (Fig. 3D). The mean lipid/N-acetyl aspartate (NAA) ratio was 0.08 ± 0.01, which indicates there are no extraneous lipids contaminating the spectra (Fig. 3E). The mean water FWHM was 14 Hz ± 1 Hz (Fig. 3F), and the SNR of the NAA peak was 144 ± 4 (Fig. 3G). LCModel fitting is shown over the original data in Fig. 3H, illustrating the signals of GABA and Glu. These metrics indicate the spectral quality of the data.
Fig. 3: MRS spectral quality metrics.
A Single-subject example of the spectrum during a game condition (n = 1). B Single-subject example of the spectrum during a rest condition between the games (n = 1). C Heat map of voxel positioning across all subjects. Voxels that are shared for fewer than 10 participants are filtered out (n = 107). D GM, WM, and CSF tissue fraction distribution across all subjects (n = 111). mean ± SD. E Lipid/NAA ratio histogram at baseline across all subjects (n = 109). F Histogram of water FWHM at baseline across all subjects (n = 109). G The histogram of SNR at baseline was calculated using the NAA peak across all subjects (n = 109). H Mean LCModel fitting over the mean original data across all subjects at the baseline condition (n = 109).
No BOLD-related spectral changes were detected, so no BOLD correction was applied to the MRS data. The root-mean-square error (RMSE) between task and baseline spectra remained unchanged across increasing exponential apodization (Fig. S1A), indicating that apodization did not unmask any BOLD-dependent line-shape alterations. Likewise, water linewidths were indistinguishable between rest and game conditions (Fig. S1B). These observations confirm that BOLD effects cannot account for the linewidth differences in our spectra.
GABA differs between gain and loss conditions
The absolute concentrations of GABA and Glu across conditions are presented in Table 1 (see Table S2 for other metabolites and the Glu/GABA ratio, called the excitatory/inhibitory (E/I) balance). We normalized the concentration during the games by subtracting the initial rest, the baseline concentration, for inter-subject correlations. The normalized concentrations are denoted as ∆GABA and ∆Glu (Table 2). To examine the difference in the behavior of GABA between the gain and loss conditions, we used a mixed model 1 (MM1) to test whether ΔGABA depends on the game probability and the game valence. We observed an interaction between GP and GL (ANOVA f{1,291.3} = 4.8; p-value = 0.03, see MM1 in the “Methods” section; Table S3, Fig. S2). During the loss condition, ΔGABA concentration in the learnable 65-Loss scenario was elevated compared to the 65-Gain game (Fig. 4A; two-tailed-t-test p = 0.02), and marginally elevated compared to the 50-Loss (two-tailed-t-test p = 0.08). During the gain condition, ΔGABA concentration did not change from the baseline rest and was elevated during the unlearnable 50-gain scenario compared to the 65-gain condition (one-tailed t-test p = 0.05). This is in accordance with our previous finding21.
Fig. 4: Correlation between ∆GABA, BOLD, and learning scores.
A ΔGABA means with SEM and distribution in 65-gain (n = 102) and 65-loss (n = 106). B Correlation between the learning scores and baseline GABA from the dACC in gain (n = 103) and loss (n = 103) groups. C Group-level activation of the dACC (GL map) and Group-level activation of the dACC within the spectroscopic voxel (ROI-GL map) in the Gain (n = 53) and Loss (n = 54) groups. D BOLD mean Z score with SEM and distribution in gain and loss. E Correlation between the learning score and the BOLD mean Z score from dACC in gain and loss groups. F Right panel: Group-level activation of the learning scores correlates with BOLD in gain > loss (Learning-GL map; n = 107−54 loss and 53 gain). Left panel: Correlation between BOLD mean Z score from dACC-activated voxels (Learning-GL map) and ΔGABA from the dACC in gain (n = 48). G Correlation between BOLD mean Z score from dACC and ΔGABA from the dACC in gain (n = 47) and loss groups (n = 50). ΔGABA represents the change in GABA from the baseline. Hollow red points represent outliers.
While ∆Glu consistently displayed an increase during all game scenarios and was correlated with ∆GABA (ANOVA f{1,380.4} = 46.7; p-value = 3e−11; Table S3), there were no significant interactions or differences across game conditions for ∆Glu (ANOVA for ∆Glu with GP and LG as factors; p > 0.1 for all). Therefore, we focus on ∆GABA interactions with behavior in this work.
GABA correlates with learning performance under the gain condition
To examine the relationship between GABA concentrations and learning performance, we tested the associations of learning performance with GABA and Glu at baseline. We found a significant effect of baseline GABA (mixed model 2 (MM2) in the “Methods” section; Table S4; F(1,396) = 6.4; p = 0.01). Specifically, at the individual level, baseline GABA concentrations were positively correlated with learning performance, but only in the gain condition (Fig. 4B; Gain: p = 0.02, Pearson’s r = −0.33; Loss: p = 0.8). By the bayesian information criterion (BIC), this model outperformed both the ΔGABA + ΔGlu + baseline-GABA + baseline-Glu model and the ΔGABA + ΔGlu-only model. This indicates that individuals with higher baseline ACC GABA levels exhibited superior learning performance in appetitive learning. Notably, this finding aligns with previous studies demonstrating similar relationships between baseline GABA levels and learning performance21,28,29,30,31,32,33.
BOLD in the dACC correlates with expected value and individual learning performance
We calculated the BOLD activity in the dACC using the spectroscopic voxel as a region of interest (ROI; ROI-GL BOLD activity map; Supplementary Fig. S3). The design matrix used two explanatory variables (EVs): one for the decision phase, weighted by ΔQ (representing the expected value difference between selected and rejected options), and another for the feedback phase, weighted by the prediction error (PE). These variables reflect neuronal coding of subjective value difference and prediction error, respectively (see the “Methods” section for more details). Z-scored activation maps were generated for each EV, representing the activation probability of each voxel. This work only discusses the decision phase contrast, representing the use of the acquired knowledge. To verify that the dACC activation is not dependent on our ROI analysis, we conducted a whole-brain analysis using the same parameters and found significant activation in the dACC (GL map; Supplementary Fig. S4; Table S5).
BOLD activity in the dACC spectroscopic voxel during the decision phase did not differ across learnable game conditions (Fig. 4C, D, two-tailed t-test p = 0.95; Gain-65: 0.6 + 0.1; Gain-50: 0.7 + 0.1; Loss-65: 0.6 + 0.1; Loss-50: 0.5 + 0.1; ANOVA test, no effects; Supplementary Fig. S2). However, only under the learnable gain condition (PG-65 gain), the BOLD activity was positively correlated with individual learning scores (Fig. 4E; Gain: p = 9e−4, r = 0.44; Loss: p = 0.3, r = 0.1; cross-correlation coefficient difference (CCD): p = 0.05). For more details, see MM 3 in methods (Tables S6, S7; ANOVA f{1,47} = 9.1; p-value: 0.004).
We ran a third, separate group-level analysis (whole-brain) to verify this finding, using the individuals’ learning scores from the fMRI games as a covariate. We found the dACC was significantly active in a gain>loss contrast (Learning-GL map; Table S8; Fig. 4F, left panel). This confirms that the dACC BOLD, representing the use of learned information, is correlated with learning performance during the gain but not the loss condition. Further, the correlation of this area with ΔGABA showed a negative correlation, similar to our finding using the ROI analysis (Fig. 4F, right panel; p = 0.05, r = −0.28). This correlation did not pass the false discovery rate (FDR) multiple comparison correction.
Activity in the dACC and Putamen is correlated with GABA during gain RL
Next, we evaluated the connection between the MRS and fMRI measurements within subjects. We separately applied linear regression (mixed model 4 (MM4) in the “Methods” section; Tables S9, S10) to the gain and loss groups. We found a significant effect of ΔGABA only within the gain group (ANOVA f{1,77.5} = 5.1; p-value: 0.03, Supplementary Fig. S2), showing a negative correlation between ΔGABA concentrations and BOLD activity within the dACC (ROI-GL activity map). This negative correlation was specific to the learnable 65-Gain condition (Fig. 4G, left panel: p = 0.01, r = −0.36) and was not detected for loss (Fig. 4G, right panel; CCD: p = 0.003), or during the unlearnable conditions (Supplementary Fig. S2).
Subsequently, we examined the interactions between ΔGABA concentrations and the BOLD activity in other brain regions that contribute to valence encoding and decision-making processes (FDR corrected for multiple comparisons; Table 3), using the Harvard–Oxford subcortical structure atlas-based masks to compute ROI-averaged BOLD changes (Table 3, masks index: M2–7). We found that BOLD activity in the left Putamen was negatively correlated with ΔGABA concentrations in the dACC only in the 65-Gain condition (Table 3; M5; p = 0.02, r = -0.33; FDR corrected). To verify this finding, we tested it with a group-level map-based mask (GL map) and found the same result (Table 3; M9 Fig. 5A, B; p = 0.02, r = −0.33; FDR corrected). Additionally, the activity in the left Putamen was positively correlated with individual learning scores and only under the 65-Gain condition (Fig. 5A, B; p = 0.02, r = 0.33).
Fig. 5: Learning performance and ∆GABA correlation in the Putamen and the Insula.
A and B Correlation between BOLD mean Z score from the left Putamen cluster of the group analysis and ΔGABA (nGain = 48, nLoss = 50), or learning score (nGain = 53, nLoss = 50) in gain (A) and loss (B) groups. C and D Correlation between BOLD mean Z score from the left insula cluster of the group analysis and ΔGABA (nGain = 48, nLoss = 50), or learning score (nGain = 53, nLoss = 50) in gain (C) and loss (D) groups. Hollow red points represent outliers.
We examined whether ΔGABA concentration mediated the relationship between the dACC and putamen BOLD activity, and learning performance, and found no statistical significance (95% confidence interval: −0.1516, 0.0992), suggesting that GABA does not directly mediate this relationship.
We did not find a correlation between ΔGABA in the dACC and the BOLD in either the left or right Insula, which forms the salience network together with the dACC34. This was examined on an atlas-based mask (Table 3, M2) and a group-level map (Table 3, M10) for presentation purposes (Fig. 5C, D). However, we found a significant positive correlation with the learning score under the 65-Gain condition only (p = 0.003, r = 0.40), supporting that the salience network participates in decision-making guided learning.
dACC-Putamen connectivity is correlated with GABA during loss RL
Beyond direct modulation of activity, inhibitory mechanisms can also modulate information transfer between the dACC and other brain regions. To test this hypothesis, we examined the relationship between ΔGABA in the dACC and BOLD-based functional connectivity. First, we calculated the connectivity during gain and loss between the dACC and several decision-making-related brain areas (Fig. 6A), as well as other reference areas (Table S11). Across both Gain and Loss groups, the dACC showed significant functional connectivity with the putamen, dlPFC, nucleus accumbens, thalamus, left amygdala, and insula after Bonferroni correction. The salience network exhibited the strongest connectivity during both gain and loss, consistent with prior research highlighting its role in gathering essential information for decision-making processes34,35,36. Mean dACC–left-putamen connectivity did not differ between groups, yet on the subject level, we found a negative correlation between ΔGABA and dACC–left-putamen connectivity only in the Loss-65 condition (Fig. 6B, p = 0.04, r = −0.29; FDR correction to all brain regions listed in Table S11). This might imply that dACC GABA reduces the dACC–Putamen connectivity during loss.
Fig. 6: Correlation between connectivity and ∆GABA.
A Connectivity strength distribution for the connectivity between the dACC cluster of the ROI group analysis and the listed regions under the gain condition (n = 52). B Correlation between the left-Putamen–dACC connectivity in z value and dACC’s ΔGABA in gain (n = 47) and loss (n = 49). C Correlation between the left-insula–dACC connectivity in z value and dACC’s ΔGABA in gain (n = 46) and loss (n = 50). Hollow red points represent outliers.
Discussion
In an earlier study, we observed that GABA in the dACC increases during uncertainty, but not learning conditions, in a similar decision-making task carried out at 3 T21. That finding suggests inhibition encodes a component of the uncertainty experienced by participants. The current study not only reproduces this previous result with a much larger sample size, but also addresses three associated questions which were not previously explored, due to limitations to the study design of the previous work: First, by employing a more comprehensive study design which separates positive and negative stimuli, we demonstrated that ΔGABA plays a differential role in aversive and appetitive RL. Second, using an ultrahigh magnetic field (7 T) and a highly optimized pulse sequence26, we were able to simultaneously quantify GABA and Glu with a high degree of precision, and separate Glu from the contaminating signal of glutamine. Finally, by acquiring BOLD and MRS data from the same participants, we were able to reveal that ΔGABA and BOLD track complementary aspects of knowledge use during RL, as discussed below.
Support for distinct ACC neuronal populations for appetitive and aversive learning
Several studies have proposed that distinct subregions in the ACC encode positive and negative valence and salience14,16,22,37,38. Even when gain and loss-responsive neurons intermingle in the ACC, as observed in primates, rarely does the same cell signals both14,15,16. Likewise, evidence from BOLD-fMRI studies points to distinct inhibitory neuronal mechanisms for appetitive and aversive learning in the dACC13. Consistent with this framework, we show that dACC GABA contributes differentially to the two RL modes: in the appetitive condition, baseline GABA correlates positively with learning performance, while no such correlation is observed during the aversive condition.
During the task, BOLD activation was observed in every condition, yet significant GABA increases emerged only in the 65-loss and 50-gain blocks, and ΔGABA correlated negatively with dACC BOLD only during the appetitive—but not aversive—game. The lack of such correlations could be explained by the dynamics of group-level ΔGABA, which is seemingly complementary: an increase in mean ΔGABA was observed during the aversive—but not appetitive—condition. This uniform ΔGABA increase potentially saturates the dynamic range of inhibitory neurons within the dACC, accounting for the lack of a ΔGABA–BOLD correlation and aligning with evidence that separate dACC neuron populations handle aversive coding activity17,39. Similar metabolite–hemodynamic decoupling, in which neurotransmitter changes proceed without parallel BOLD modulation, has been described for Glu40,41,42,43,44,45, suggesting that neurotransmitter shifts can occur without parallel blood-flow changes, or that the recorded BOLD partly reflects local interneuron spiking.
Additional evidence for distinct ACC mechanisms during appetitive and aversive learning comes from examining the relationship of GABA changes to BOLD activations in additional regions: the presence of a correlation between ΔGABA and putamen BOLD in the gain condition, and its absence in the loss condition, suggests that appetitive reinforcement learning relies more on ACC-striatal loops, circuits in which cortical inhibition is a key regulator, whereas aversive learning may rely on other brain regions (e.g., amygdala and insula)25. Although our analyses suggest that GABA does not directly mediate between joint dACC–putamen activity and learning performance, an indirect pathway remains possible through a secondary brain region or by modulating other neurotransmitter pathways. This interpretation agrees with animal evidence that the ACC-projecting putamen/caudate complex is gain-related46,47, and that elevated ACC GABA during appetitive tasks modulates neurons relying on prior knowledge19,48.
No comparable dACC–putamen relationship emerged under losses, aligning with evidence that separate ACC populations drive appetitive and aversive RL13,14,15,16. One speculative explanation is that aversive ΔGABA arises from a neuronal subset that suppresses dACC–putamen interaction. This idea is supported by our finding that ΔGABA correlated weakly, but only in the 65-Loss condition, with dACC–putamen connectivity. Alternatively, aversive learning may rely on amygdala- and insula-centered circuits governed by other neurotransmitters, making them undetectable with our current MRS-fMRI design25.
Taken together, these findings support the existence of two neuronal populations within the dACC: one that controls appetitive value coding via indirect projection from the dACC to the left Putamen and another related to aversive value coding via a different yet-undetermined mechanism. Our 10-mL voxel is large enough to capture both populations.
A differential role for tonic vs. phasic GABA
A second major finding has been that, while baseline (tonic) GABA correlates positively with learning performance in the appetitive RL condition, it is ΔGABA (phasic GABA) that correlates negatively with value-coding BOLD signal in the dACC and left putamen; whereas in the aversive RL condition, neither baseline GABA nor ΔGABA is related to behavioral or physiological measures (potentially driven by the saturation in group-level ΔGABA). This supports a two-stage inhibitory framework, where tonic GABA lowers noise and refines value or prediction error signals, so that individuals with greater tonic inhibition acquire gains more efficiently. Aversive learning, by contrast, may rely primarily on other brain areas, so that tonic dACC GABA is a weaker predictor. During the gain games, phasic changes in ΔGABA in the dACC encode reward values through the dACC–putamen projection, whereas in the loss games, the global rise in ΔGABA possibly silences this projection. Indeed, such a theory is strongly supported by previous literature: the role of tonic GABA in modulating learning capacity converges with abundant evidence from the literature from comparable paradigms21,28,29,30,31,32,33,49,50,51, while the presence of an inverse ΔGABA–BOLD relationship during cognitive demand has also been reported5,52,[53](https://www.nature.com/articles/s42003-025-08813-2#