Introduction
Successful social living often requires considering the conflicting ways our behaviors simultaneously affect ourselves and those around us. People frequently make equitable or cooperative choices that are mutually beneficial. Other-benefitting behaviors can also be performed at a cost to the self, which is termed altruism. In contrast, antisocial behaviors are those that harm others. Sometimes, antisocial behaviors benefit the self, which is termed instrumental harm. Antisocial behaviors can also be mutually costly, with negative consequences for both oneself and others. Understanding how people acquire such behaviors is important, as prosocial behaviors improve social relationships and well-being, among other positive outcomes[1](#ref-CR1 “Aknin, L. B., Whillians, A…
Introduction
Successful social living often requires considering the conflicting ways our behaviors simultaneously affect ourselves and those around us. People frequently make equitable or cooperative choices that are mutually beneficial. Other-benefitting behaviors can also be performed at a cost to the self, which is termed altruism. In contrast, antisocial behaviors are those that harm others. Sometimes, antisocial behaviors benefit the self, which is termed instrumental harm. Antisocial behaviors can also be mutually costly, with negative consequences for both oneself and others. Understanding how people acquire such behaviors is important, as prosocial behaviors improve social relationships and well-being, among other positive outcomes1,2,3,4, whereas antisocial behaviors are linked to deleterious social and functional outcomes5,6,7. But although previous research has examined social decision-making with joint outcomes at a single time point (i.e., via economic games) and has also examined how people learn to select behaviors that benefit the self or others separately, little is known about how people integrate conflicting outcomes for the self and others during learning. We therefore examined the computational and neural processes that support learning behaviors that carry simultaneous outcomes for the self and others, and test whether similar processes may generalize across different types of choices linked to joint social outcomes.
The neurocognitive architecture that allows organisms, including humans, to learn to select or avoid behaviors that carry benefits or costs for themselves has been a topic of great interest8,9,10,11. Assigning and updating expected values for available actions in a domain-general neural code has been shown to be encoded in the ventral portion of the medial prefrontal cortex12, and has been linked to learning to select actions that benefit the self11,13. This weighted update is thought to occur via prediction error signals encoded in dopaminergic neuronal firing in ventral tegmental area, which projects to ventral striatum8,14,15,16. Furthermore, people may be biased in the way they update the expected value of choices. For example, people place greater weight on outcomes that were better than expected (i.e., they weigh positive prediction errors more than negative prediction errors)17,18. People also asymmetrically update expected values of actions that benefit others versus themselves19,20,21. For example, activity in the subgenual anterior cingulate seems to specifically track information about other-benefitting but not self-benefitting prediction errors21,22. And some evidence suggests that people are more sensitive to unexpectedly positive outcomes when learning to avoid harming others23. It is not known, however, how information about self-versus-other and positive-versus-negative outcomes interact when learning how actions affect both themselves and others.
People vary widely in their propensity for prosocial versus antisocial behavior24,25. Prosocial individuals may be those who are more sensitive to information about outcomes for others, whereas antisocial individuals may be less sensitive to information about outcomes for others. For example, those who are higher in trait cognitive empathy or lower in trait psychopathy exhibit increased learning rates when learning to benefit others (i.e., they place greater weight on other-relevant prediction errors to guide future choices)20,21 and people who are less sensitive to others’ losses during learning exhibit greater antisocial behavioral tendencies[26](https://www.nature.com/articles/s41467-025-64424-9#ref-CR26 “O’Connell, K., Walsh, M., Padgett, B., Connell, S. & Marsh, A. A. Modeling variation in empathic sensitivity using go/no-go social reinforcement learning. Affect. Sci. https://doi.org/10.1007/s42761-022-00119-4
(2022).“). In addition, people who place greater weight on the learned values of available actions for others (i.e., exhibit higher inverse temperatures) display increased valuation-related ventromedial prefrontal cortex activation when learning to avoid harmful outcomes for others23.
Although real-world prosocial and antisocial behaviors typically affect the self and others simultaneously, how people incorporate conflicting information about how their choices affect themselves and others (for example, that a behavior helps oneself but harms another) is poorly understood (but see Sul et al.)19. But such conflicts define many prosocial and antisocial choices, for example, donating money versus stealing money. Prosocial and antisocial choices have been traditionally studied with economic games that model decisions in a single point in time. Such decisions recruit brain regions that overlap with but appear distinct from purely self-benefitting learning, such as medial and lateral prefrontal cortex, striatum, anterior insula, and supplementary motor area27,[28](https://www.nature.com/articles/s41467-025-64424-9#ref-CR28 “Rhoads, S. A., Cutler, J. & Marsh, A. A. A feature-based network analysis and fMRI meta-analysis reveal three distinct types of prosocial decisions. Soc. Cogn. Affect. Neurosci. https://doi.org/10.1093/scan/nsab079
(2021).“). However, these games cannot identify the processes by which people acquire prosocial or antisocial action preferences.
Prior work suggests two possible computational accounts by which this learning may occur. According to the first account, people maintain and update different expected values according to self- and other-relevant consequences. In other words, they update parallel value functions that track how actions affect themselves and others, respectively—a process that can be interpreted as a minimal form of value simulation29,30,[31](#ref-CR31 “Charpentier, C. J., Iigaya, K. & O’Doherty, J. P. A neuro-computational account of arbitration between choice imitation and goal emulation during human observational learning. Neuron 1–38 https://doi.org/10.1016/j.neuron.2020.02.028
(2020).“),32. This account aligns with valuing for others—tracking how actions impact others—rather than valuing from others’ perspective33. Several lines of evidence support this hypothesis. One is that simulation ability predicts generosity34,35,36. A recent study on social influence found that respondents’ choices are guided by a combination of distinct value signals from personal experience and observational learning about others’ choices29. According to this account, people additionally infer and represent the value function of other people independently from the function that is updated with self-relevant information. An alternative model is that the value of behaviors may be updated according to different prediction errors for self- versus other-relevant outcomes but then integrated into a single expected value that guides behavior. In other words, people maintain a single value function but are differentially sensitive to different types of unexpected outcomes. This account of learning would be consistent with findings from behavioral economics that the expected utility of choices can be represented as a weighted sum of self- and other-relevant outcomes37,38. Integrating information at the outcome phase of learning rather than maintaining all types of information may also be more computationally efficient, especially when someone is learning how actions affect multiple agents’ outcomes.
Here, we tested these alternative hypotheses using a novel multi-outcome social learning task, in which various choices simultaneously affected the self and another person, either in the same way (mutual benefit, mutual cost) or in conflicting ways (altruism, instrumental harm). This task extended earlier prosocial learning tasks in which choices result in single outcomes for different targets (e.g., self, other, no one)20,21. Using computational modeling and neuroimaging, we found that simultaneous self and social learning depends on integrating self- and other-relevant information into a single expected value per choice, which was updated asymmetrically based on different types of prediction errors related to the target (self, other) and valence (positive, negative). People who were less sensitive to unexpected positive and negative outcomes for others learned to make more antisocial choices and fewer prosocial choices, and were characterized by higher levels of subclinical psychopathic traits. Model-based neuroimaging revealed that brain areas previously associated with instrumental and social learning tracked these prediction errors guided by the asymmetric value update, including ventral striatum, subgenual anterior cingulate, pregenual anterior cingulate, amygdala, and anterior insula.
Results
Participants completed a task in which their choices simultaneously affected themselves and another person. In two versions of the task, participants viewed fractal images randomly positioned on the screen and selected one image. Each of four images were associated with an independent probability distribution of gaining or losing points for the self and for a study partner (Supplementary Fig. S1). Specifically, there were four categories of stimuli, although participants were not aware of this. These were: mutually beneficial (the option that most frequently yielded beneficial outcomes for both themselves and their partner), instrumentally harmful (the option that most frequently benefited themselves at a cost to their partner), altruistic (the option that most frequently benefited the partner at a cost to the self), and mutually costly (the option that most frequently yielded losses to both themselves and their partner). In the four-option task, participants chose one of four images on a given trial in three runs of 60 trials each (180 trials total). In the two-option task, participants chose one of two images on a given trial in six runs of 40 trials each (240 trials total) such that every possible pair of images was presented (4!/(2! × (4 – 2)!) = 6 runs). Thus, participants had the opportunity to choose each of the four options in three different blocks across both versions of the task. If the gain or loss did not occur, there was no change in points for that person on that trial. Each choice thus resulted in outcomes for two targets: self (gain/loss/null) and other (gain/loss/null). To model separable self- and other-relevant prediction errors at the neural level, one of the outcomes (for self or other) was randomly hidden from the participant. Therefore, on each trial, participants selected one image and then either saw whether their selection resulted in a gain, loss, or no change for themselves or alternatively saw whether their selection resulted in gain, loss, or no change for a study partner. After any trial, they could infer how their choice affected the hidden target. (Fig. 1A, B). A chi-square test of independence revealed no statistically significant difference between tasks and choice frequencies, indicating that choice behavior did not differ significantly across tasks, χ2(3, 208) = 6.37, p = 0.095. Participants rated the tasks to be relatively easy (Supplementary Information).
Fig. 1: Behavioral task and computational modeling.
Task design, choice frequencies, and behavioral modeling results for the four-option task (n = 89), two-option task (n = 119), and combined sample (n = 208). A Example choice trial for the four-option task with two possible outcomes. B Example choice trial for the two-option task with two possible outcomes. C People exhibit self-regarding biases, choosing the mutually beneficial and instrumentally harmful options most frequently. Error bars represent the standard errors around the mean proportions. D, E Model comparison metrics indicate that people integrate information about outcomes for self and others during learning. Asterisks (*) indicate the best-fitting model according to the specified model comparison metric. F Mean learning rates with 95% confidence intervals depicting how people learn from different types of prediction errors (PEs). †Note that the percentages in two-option task do not sum to 100% because each option was displayed to participants in a pairwise fashion in three out of six blocks. Each of the four options was displayed to participants in 120 trials (out 240 trials total).
Model-agnostic linear mixed effects analyses predicting outcome as a function of trial, target, and trait psychopathy revealed consistent results across task paradigms. Across both task paradigms, participants learned to obtain higher rewards over trials on average. In addition, this relationship was greater for self-relevant outcomes than other-relevant outcomes. As trait psychopathy increased, the disparity between self- and other-relevant outcomes increased across trials (Supplemental Table S1 and Supplementary Fig. S2).
People integrate information about outcomes for self and others
We next tested our pre-registered computational hypotheses about how learning occurred. To do this, we fit ten different computational models to participants’ behavior, which were validated with model identifiability and parameter recovery analyses from simulated data (see Methods). Six models (2Q models; value simulation) corresponded to the first hypothesis, according to which people maintain and update separate expected values according to self- and other-relevant outcomes (represented as a vector of expected values). In other words, they simulate how others value their own choices during learning. Four models (1Q models; value integration) corresponded to the second hypothesis, according to which people maintain and update a single expected value that integrates information about both self- and other-relevant outcomes. The models within each class were nested in a straightforward manner such that they considered all combinations of parameters (Supplementary Information).
Having established the models were identifiable and parameters recoverable across both task paradigms using simulated data (see Methods), we performed Bayesian model selection on the empirical data. Across all samples, results provided robust evidence in support of behavior reflecting an algorithm that integrated self- and other-relevant information into a single expected value per choice (Model 4: 1Q–4α1β; value integration), supporting the second value integration hypothesis. Prediction errors (δ) signaled unexpected outcomes as a result of selecting an action, and, in turn, influenced the expected value (Q) of performing that action again in the future. This expected value update was weighted differently depending on both the target (e.g., self, other) and valence (e.g., positive, negative) of the prediction error. In other words, prediction errors were weighted asymmetrically to update the expected value of a choice according to the following four learning rate parameters: ({\alpha }_{{self}}{+}) (weighed self-relevant, positive prediction errors), ({\alpha }_{{self}}{-}) (weighed self-relevant, negative prediction errors), ({\alpha }_{{other}}{+}) (weighed other-relevant, positive prediction errors), and ({\alpha }_{{other}}{-}) (weighed other-relevant, negative prediction errors). Each weight quantified how much each type of prediction error influenced the expected value of a future action. The expected value of actions (Q) was also weighted by an inverse temperature parameter (β) that quantified how much the current expected value of available actions (({{{{\rm{Q}}}}}_{{{{\rm{t}}}}}^{{{{\rm{k}}}}})) affects the probability of selecting an action. For all samples and tasks, this model fit the data best across various model comparison metrics, including log model evidence, integrated Bayesian Information Criteria (BIC), pseudo-R2, and protected exceedance probability (Fig. 1D). Across both studies, we did not find evidence that task difficulty predicted the best cognitive strategy (value integration vs value simulation) (Supplemental Information). To test the possibility that observed asymmetries in learning rates might reflect differences in reward sensitivity rather than learning, we fit eight additional models, which included outcome sensitivity parameters for self and other. Model fitting and model comparison provided support for the present findings that people learn less from other-relevant outcomes rather than simply being less sensitive to their rewards (Supplemental Information; Supplementary Fig. S3). With the goal of providing clarity about the behavior that the estimated parameters are capturing, we computed correlations between estimated parameters and various behavioral outcomes, including points earned for self and other and choice frequencies. In general, people who acquired more prosocial patterns were more sensitive to information about how their choices affected others (Supplementary Figs. S4–5).
On average, people exhibit self-benefiting biases
Across both tasks, participants most frequently selected the options that benefitted the self (i.e., the mutually beneficial option or the instrumentally harmful option; Fig. 1C). In the four-option task, participants chose the mutually beneficial option during 45.4% of trials on average. This was followed by the instrumentally harmful option (29.1%), the altruistic option (13.5%), and then the mutually costly option (10.9%). Participants missed 1.1% of choices on average. In the two-option task, participants chose the instrumentally harmful option in 66.1% trials on average, followed by the mutually beneficial option (63.3%), the altruistic option (35.6%), and then the mutually costly option (35.1%). Note that the percentages in two-option task do not sum to 100% because each option was displayed to participants in a pairwise fashion in three out of six blocks. Each of the four options was displayed to participants in 120 trials (out 240 trials total).
In line with our pre-registered hypothesis, this self-benefiting bias was also reflected in people’s asymmetric learning rate estimates. Across both tasks, linear mixed-effects modeling with random intercepts revealed that people exhibited higher self-relevant learning rates than other-relevant learning rates (Four-option task: coefficient = 0.82, SE = 0.17, Z = 4.96, CI95% = [.50, 1.15], p < 0.001, n = 89; Two-option task: coefficient = 2.09, SE = 0.17, Z = 12.46, p < 0.001, CI 95% = [1.76, 2.42], n = 119). Participants also exhibited a positivity bias (Four-option task: coefficient = 0.35, SE = 0.17, Z = 2.09, p = 0.036, CI 95% = [0.02, 0.67], n = 89; Two-option task: coefficient = 2.48, SE = .17, Z = 14.78, p < 0.001, CI 95% = [2.15, 2.81], n = 119). In other words, people learned more from self-relevant prediction errors and positive prediction errors. In the two-option task, we also observed a valence x target interaction effect, such that people learned most from self-relevant, positive prediction errors and least from other-relevant, negative prediction errors (Four-option task: coefficient = 0.21, SE = 0.24, Z = 0.91, p = 0.36, CI 95% = [−0.25, 0.67], n = 89; Two-option task: coefficient = −0.60, SE = 0.24, Z = −2.50, p = 0.012, CI 95% = [−1.06, −0.13], n = 119; Table 1).
Higher trait psychopathy is related to decreased sensitivity to information about how choices affect others
As predicted, participants also varied in terms of their parameter estimates (Supplementary Fig. S6). To characterize this variability, we fit a series of linear mixed-effects models to test whether the learning rate varied as a function of target, valence, and self-reported trait measures of prosocial traits (i.e., empathy, social value orientation) or antisocial traits (i.e., trait psychopathy). Despite our pre-registered hypotheses, we found no relationships between learning rates in the four-option task and measures of trait cognitive or affective empathy (Supplementary Table S2). In an exploratory analysis, we also tested whether learning rates in the two-option task varied as a function of social value orientation (SVO) and found no interaction between target and SVO (Supplementary Table S3 and Supplementary Fig. S7).
By contrast, and in line with pre-registered hypotheses, trait psychopathy was associated with learning rates in both the four-option and two-option tasks (Fig. 2). In the four-option task, we found an interaction between target and psychopathy (Target × Psychopathy: coefficient = 1.418, SE = .530, Z = 2.675, p = 0.007, CI 95% = [.38, 2.46], n = 89), such that individuals with higher psychopathy scores exhibited lower learning rates in response to other-relevant information. In the two-option task, psychopathy interacted with both valence and target relevance (Valence × Psychopathy: coefficient = 1.245, SE = 0.33, Z = 3.774, p < 0.001, CI 95% = [0.60, 1.90], n = 119; Target × Psychopathy: coefficient=1.553, SE = 0.33, Z = 4.708, p < 0.001, CI 95% = [0.91, 2.2], n = 119), such that people with higher psychopathy scores exhibited lower learning rates in response to both other-relevant and negative prediction errors (Table 2).
Fig. 2: Log learning rates as a function of target, valence, and trait psychopathy.
Plots illustrate log learning rates (y-axis) as a function of trait psychopathy (x-axis), target (colors), and valence (rows) for the four-option task (n = 89; left column) and two-option task (n = 119; right column). Ribbons indicate bootstrapped 95% confidence intervals. Trait psychopathy was computed using the mean score on the Triarchic Psychopathy Measure. Learning rates are log-transformed to account for the skewed distribution.
Psychopathy is viewed as a multidimensional construct that reflects meanness, boldness, and disinhibition39. We explored how learning varied as a function of these sub-factors. We found that trait meanness—which most consistently reflects callousness and antisocial attitudes and behavior—consistently led to worse learning about other-relevant information (Supplementary Tables S4–6 and Supplementary Figs. S8–10). In both tasks, meanness showed a significant interaction with target relevance (four-option task: coefficient = 1.054, SE = 0.426, Z = 2.476, p = 0.013, CI 95% = [0.22, 1.89], n = 89; two-option task: coefficient = 1.087, SE = 0.204, Z = 5.322, p < 0.001, CI 95% = [0.69, 1.49], n = 119).
Brain regions encoding the expected value of chosen options
To identify brain regions that encoded the expected value of options during choices in a subset of participants who completed the four-option task during neuroimaging, we tested where activation parametrically varied as a function of the expected values predicted by the best-fitting computational model at the decision phase of the task. Expected value signals were encoded in bilateral medial prefrontal cortex, bilateral anterolateral temporal cortex, and bilateral posterior cingulate. These results aligned with our pre-registered hypotheses. We also found several regions in which activation was negatively related to the expected value of choices, including bilateral anterior insula, bilateral supplementary motor area, bilateral dorsolateral prefrontal cortex, bilateral precuneus, bilateral visual cortex, bilateral cerebellum, bilateral superior and inferior parietal lobule, and bilateral dorsal striatum (Fig. 3 and Supplementary Table S7). These results suggest that a distributed network of regions integrates value-related information during social learning and decision-making.
Fig. 3: Brain regions encoding the expected value of choices.
Group-level statistical map showing brain regions encoding the expected value of choices. Statistical significance was determined using a permutation test and corrected for multiple comparisons at a cluster-level family-wise error rate (FWER) of p < 0.001 (two-sided). Colormap represents the strength (magnitude of Z statistic) and direction (sign of Z statistic) of the relationship between BOLD activation and the expected value of the chosen option.
Brain regions encoding prediction errors
To identify brain regions that tracked prediction errors, we tested where activation parametrically varied as a function of the prediction errors estimated by the best fitting computational model at the outcome phase of the task. Prediction error signals were encoded in medial prefrontal cortex, subgenual anterior cingulate, and ventral striatum. These results aligned with our pre-registered hypotheses. We also found activation in several regions that negatively correlated with prediction errors, including bilateral supplementary motor area, right insula, right middle cingulate gyrus, and left fusiform gyrus (Fig. 4 and Supplementary Table S8), suggesting these regions may play a role in error monitoring and behavioral adjustment during social learning.
Fig. 4: Brain regions encoding prediction errors.
Group-level statistical map showing brain regions encoding prediction errors. Statistical significance was determined using a permutation test and corrected for multiple comparisons at a cluster-level family-wise error rate (FWER) of p < 0.001 (two-sided). Colormap represents the strength (magnitude of Z statistic) and direction (sign of Z statistic) of the relationship between BOLD activation and prediction errors.
Brain regions encoding the asymmetric expected value update
Because we found that the expected value of choices was updated via four different types of prediction errors according to the best-fitting computational model, we anticipated that brain regions previously linked to prediction errors might be responsible for the asymmetric value integration at the outcome stage of learning. In other words, we tested whether different brain regions encode the updated expected value on subsequent trials according to specific prediction errors. To do this, we first estimated the degree to which activations during outcomes parametrically varied as a function of the expected value update estimated from the best-fitting computational model (i.e., ({{{{\rm{Q}}}}}_{{{{\rm{t}}}}}+{{{\alpha }}} * {{{{\delta }}}}_{{{{\rm{t}}}}})). We first conducted this analysis across the whole brain using an FWE corrected threshold of p < 0.001 (Supplementary Table S9). Then, for each type of value update, we extracted the mean parameter estimates from a priori regions of interest that had been previously linked to prediction error coding in self and social contexts40,41,42: ventral striatum8,14,15,16, subgenual anterior cingulate21,43, anterior insula15, amygdala41,44, and pregenual anterior cingulate43,45. We then tested whether the average encoding strength of the new expected value was different from zero using a non-parametric sign-flipping procedure with 10,000 permutations and a false discovery rate (FDR) corrected threshold of q < 0.05 across the ten regions (five regions per hemisphere).
Results showed that these regions encoded each type of prediction error to varying degrees (Fig. 5). The self-relevant, positive weighted value update was encoded in bilateral ventral striatum (left: Z = 3.59, p < 0.0001; right: Z = 3.63, p < 0.0001; n = 27), bilateral subgenual anterior cingulate (left: Z = 2.43, p = 0.003; right: Z = 2.79, p = 0.0002; n = 27), bilateral amygdala (left: Z = 3.02, p < 0.0001; right: Z = 1.99, p = 0.018; n = 27), and bilateral pregenual anterior cingulate (left: Z = 3.43, p < 0.0001; right: Z = 2.64, p = 0.003; n = 27). The self-relevant, negative weighted value update was inversely encoded in bilateral anterior insula (left: Z = −2.92, p = 0.001; right: Z = −2.4, p = 0.007; n = 27). The other-relevant, positive weighted value update was encoded in bilateral subgenual anterior cingulate (left: Z = 2.92, p < 0.0001; Z = 3.2, p = 0.0001; n = 27), bilateral pregenual anterior cingulate (left: Z = 3.55, p < 0.0001; right: Z = 2.6, p = 0.003; n = 27), and right ventral striatum (Z = 1.88, p = 0.022; n = 27). The other-relevant, negative weighted value update was encoded in right subgenual anterior cingulate (Z = 2.08, p = 0.013; n = 27). These results demonstrate distinct patterns of value update encoding across several regions, with value updates from positive prediction errors encoded more broadly compared to negative prediction errors and consistent value encoding across both self- and other-relevant learning. In an exploratory analysis, we also tested whether neural encoding in these regions of interest varied as a function of trait psychopathy. We found that the extent to which left ventral striatum encoded the value update for other-relevant positive prediction errors was inversely linked to trait psychopathy (ρ = −0.41, p = 0.034, n = 27). However, this result did not survive multiple comparison correction at FDR q < 0.05.
Fig. 5: Brain regions encoding the asymmetric expected value update.
Mean parameter estimates indicating the extent to which regions of interest encode asymmetric expected value updates for the four different types of prediction errors. Error bars represent the 95% confidence intervals. Rows represent different regions of interest. Columns represent different types of weighted value updating. Point plot y-axes indicate the parameter estimates indexing the relationship between BOLD activation and weighted value update. Points represent the mean estimates. Dashed line represents 0. Voxel patterns indicate the weighted value update encoding relationship depicted in the point plots from blue (negative relationship) to red (positive relationship) within a given region at the depicted brain slice. All plots represent data from the neuroimaging sample that completed the four-option task (n = 27). Asterisks (*) indicate regions surviving FDR-correction using q < 0.05. (VS = Ventral striatum, SGACC = Subgenual Anterior Cingulate Cortex, AI = Anterior Insula, AMYG = Amygdala, ACC = Pregenual Anterior Cingulate Cortex).
Discussion
In three pre-registered samples of participants and two task paradigms, we examined the computational and neural mechanisms of simultaneous learning for self and others. We provide evidence that people learn to integrate self- and other-relevant information into a single value per choice. However, this value is updated asymmetrically according to distinct prediction errors, which encode information about the target (self or other) and valence (positive or negative). Participants who were more sensitive to unexpected positive and negative outcomes for others learned to make more prosocial choices and fewer antisocial choices. By contrast, as trait-level psychopathy increased, other-relevant learning rates decreased and self-relevant learning rates increased, suggesting a computational phenotype underlying the acquisition of antisocial behaviors in psychopathy.
Model-based neuroimaging analyses showed that the expected value assigned to choices corresponded to activation in medial prefrontal cortex, anterolateral temporal cortex, and posterior cingulate. Prediction errors broadly corresponded to activation in medial prefrontal cortex, ventral striatum, and subgenual anterior cingulate. Notably, different and overlapping regions encoded the way that different types of prediction errors guided the asymmetric value update. Ventral striatum and pregenual anterior cingulate guided value updating via positive prediction errors (regardless of target). Subgenual anterior cingulate guided value updating via other-relevant prediction errors (regardless of valence) and via self-relevant positive prediction errors. Anterior insula guided value updating via self-relevant, negative prediction errors. Amygdala guided value updating via self-relevant, positive prediction errors. This asymmetry in how the brain updates expected value of choices based on who is affected and whether the outcome is better or worse than expected provides a neural basis for individual differences in the behavioral biases observed (e.g., high sensitivity to self-relevant, positive prediction errors).
Beyond identifying the brain regions supporting the asymmetric integration of new information when acquiring prosocial behaviors, our results suggest that learning to select actions that help or harm both the self and others may be computationally distinct from cognitive tasks that require actively representing and maintaining how others value our actions. Instead, people integrate self- and other-relevant information during learning to guide future prosocial behaviors, suggesting that the brain combines self- and other-regarding information into a common valuation signal, rather than maintaining entirely separate valuation systems for the self and others when choosing options that have dual social outcomes. In other words, the way people distinctly encode self- and other-relevant outcomes that result from a particular behavior guides how desirable that same behavior will be in the future, regardless of whether the behavior is mutually beneficial or costly, instrumentally harmful, or altruistic.
Decisions in our task were guided by a single expected value, which was encoded by activation in medial prefrontal cortex, which is consistently found to track expected values of choices during learning and decision-making in self-relevant11,13 and social contexts19,38,46. Our findings provide