Main
Continual learning is the ability to acquire multiple tasks in succession. Learning tasks in sequence is challenging because new task acquisition may cause existing knowledge to be overwritten, a phenomenon called catastrophic interference. Artificial neural networks (ANNs) trained with gradient descent are particularly prone to catastrophic interference[1](#ref-CR1 “McCloskey, M. & Cohen, N. J. in Psychology of Learning and Motivation Vol. 24 (ed. Bower, G. H.) 109–165 (Academic, 1989); https://doi.org/10.1016/S0079-7421(08)60536-8
“),2,[3](https://www.nature.com/articles/s41562-025-02318-y#ref-CR3 “French, R. M. C…
Main
Continual learning is the ability to acquire multiple tasks in succession. Learning tasks in sequence is challenging because new task acquisition may cause existing knowledge to be overwritten, a phenomenon called catastrophic interference. Artificial neural networks (ANNs) trained with gradient descent are particularly prone to catastrophic interference[1](#ref-CR1 “McCloskey, M. & Cohen, N. J. in Psychology of Learning and Motivation Vol. 24 (ed. Bower, G. H.) 109–165 (Academic, 1989); https://doi.org/10.1016/S0079-7421(08)60536-8
“),2,3. Their difficulties with continual learning are often counterpointed with those of humans, who seem to be capable of accumulating and retaining knowledge across the lifespan. The computational basis of continual learning in humans is a topic of active investigation4,5,6,7, and a consensus has emerged that task learning in humans and linear ANNs may rely on fundamentally different mechanisms7. Here, we describe work that challenges this assumption.
Recent work has shown that catastrophic interference can be counterintuitively worse when successive tasks are more similar to each other8,9,10. When faced with similar tasks, ANNs tend to adapt existing representations, rather than forming new ones. This allows for better transfer (where learning one task accelerates learning of others), but existing representations are corrupted, provoking heightened interference. By contrast, when dissimilar tasks are encountered in succession, ANNs adopt a different strategy, which involves forming entirely new representations. This means that learning proceeds more slowly, but networks suffer less from interference8,10,11. In other words, higher catastrophic interference can be a cost that accompanies the benefits of transfer.
Although catastrophic forgetting in ANNs is often contrasted with successful continual learning in biological systems, there is good reason to believe they might rely on common principles of generalization and interference. In psychology, the term ‘retroactive interference’ refers to the phenomenon where new learning interferes with previous knowledge, analogous to catastrophic interference12,13,14,15,16,[17](#ref-CR17 “Postman, L. in Verbal Learning and Verbal Behavior (eds Cofer, C. N. & Musgrave, B. S.) 152–196 (McGraw-Hill, 1961); https://doi.org/10.1037/11182-007
“),18,19,[20](https://www.nature.com/articles/s41562-025-02318-y#ref-CR20 “MacLeod, C. M. in The Oxford Handbook of Human Memory, Two Volume Pack 1st edn (eds Kahana, M. J. & Wagner, A. D.) 1173–1208 (Oxford Univ. Press, 2024); https://doi.org/10.1093/oxfordhb/9780190917982.013.40
“). Cases of retroactive interference, for example, in sequential recall tasks, have also been proposed to depend on task similarity[20](#ref-CR20 “MacLeod, C. M. in The Oxford Handbook of Human Memory, Two Volume Pack 1st edn (eds Kahana, M. J. & Wagner, A. D.) 1173–1208 (Oxford Univ. Press, 2024); https://doi.org/10.1093/oxfordhb/9780190917982.013.40
“),21,22,23,24,25. To take an intuitive example, language learners will find it easier to learn Italian after learning a similar Romance language (for instance, French compared with Korean), but may begin misapplying Italian words in French as a result. In dual-task paradigms, where participants must perform two tasks simultaneously, it is well established that cross-task interference is higher for tasks with shared structure26,27, argued to be an intrinsic cost of sharing neural representations across tasks[28](#ref-CR28 “Musslick, S. et al. Multitasking capability versus learning efficiency in neural network architectures. In Proc. Annual Meeting of the Cognitive Science Society Vol. 39, 829–834 (Cognitive Science Society, 2017); https://escholarship.org/uc/item/5t85k9bm
“),29,30. While these studies suggest there may be similar trade-offs occurring in humans too, as far as we are aware, no previous studies have systematically compared how patterns of catastrophic interference relate to transfer during continual learning in humans and ANNs.
Here, we directly compare humans and linear ANNs performing the same continual learning task, with a view to examining whether transfer and interference are governed by analogous computational principles. To investigate the fundamental computational principles that govern transfer and interference during continual learning, we adopted a minimalist modelling approach using layerwise linear neural networks. We trained both classes of learners on two sequential tasks (task A and task B) and then retested their knowledge of the first task (task A). First, we studied the effects of task similarity by varying the relationship between two task rules across three different groups of subjects (Same, Near and Far rule conditions). For both humans and ANNs, more similar tasks led to faster learning of task B (transfer), while more dissimilar tasks resulted in lower interference from task B when retested on task A. In ANNs, by analysing the hidden layer representations, we were able to show the precise computational principles that govern this effect. Consistent with previous work10, we found that networks encode similar tasks in shared subspaces, which leads to interference; when they are sufficiently different however, networks encode tasks in separate, non-overlapping subspaces, which eliminates catastrophic interference.
Alongside these phenomena, we observed substantial individual differences consistent with a computational trade-off between the benefits of transfer and the avoidance of interference. Writing to a friend, the naturalist Charles Darwin described two groups of taxonomists: those who preferred to divide the botanical world into as many different species as captured their unique properties, and those who focused on commonalities, preferring to merge across the differences. Reflecting on these groups, he wrote, ‘It is good to have hair-splitters & lumpers’31. In our study, we found a similar divergence in how people structured new information. Some participants reused the same rule across all stimuli (‘lumpers’), which allowed them to learn faster in the second task, while incurring more interference when retested on the original task. These participants were also better at generalizing to unseen stimuli within a task, by applying their knowledge of the shared task rule. Meanwhile, other participants were able to avoid interference, but at the cost of worse transfer to new tasks and poor generalization within a task (‘splitters’). Intriguingly, this group was better at recalling unique properties of the stimuli. These findings suggest that a tendency to focus on generalization of shared features versus individuation of unique features may reflect a meaningful axis of variation in human learning, although further work is needed to determine the stability of these tendencies across contexts.
We sought to understand these individual differences using our modelling framework. We drew upon recent work in machine learning revealing that networks can solve the same task using fundamentally different representations. In so-called rich networks, inputs are encoded in representations which reflect the low-dimensional structure of the task. By contrast, so-called lazy networks rely on high-dimensional, discriminable projections of the inputs, which form a basis for flexible downstream computations but often generalize poorly32,33,34,35,36,[37](https://www.nature.com/articles/s41562-025-02318-y#ref-CR37 “Dominé, C. C. J. et al. From lazy to rich: exact learning dynamics in deep linear networks. Preprint at https://arxiv.org/abs/2409.14623
(2024).“). This transition from the ‘rich’ to ‘lazy’ regime can be driven by the scale of the initial weights34,35,[37](https://www.nature.com/articles/s41562-025-02318-y#ref-CR37 “Dominé, C. C. J. et al. From lazy to rich: exact learning dynamics in deep linear networks. Preprint at https://arxiv.org/abs/2409.14623
(2024).“). We found that we could fully account for the individual differences in human learning by assuming a mixture of rich and lazy task solutions that favour generalization or individuation.
Together, these results point to key parallels between the trade-offs governing transfer and interference in humans and ANNs in continual learning settings. In both learning systems, learners who benefit most from generalizing shared structure also demonstrate the highest costs of interference. This balance is influenced both by external variables such as task similarity and by differences in the initial learning strategies.
Results
Humans and twinned linear ANNs (collectively, ‘learners’) learned two successive tasks (task A followed by task B) and were then retested on task A (Fig. 1). Each task required learners to map six discrete inputs (plants) onto positions on a ring (locations on a planet) in two distinct contexts (the seasons of summer and winter; Fig. 1a). Within each of the two tasks, a consistent angle referred to as the ‘task rule’ defined the relationship between summer and winter locations for any plant. For example, within task A, each plant’s winter location might always be 120° clockwise from their summer location (rule A = 120°; Fig. 1b). Learners were always probed on a plant’s summer location first, and then its winter location after viewing feedback, allowing inference about the rule that linked the seasons. Notably, for one of the six stimuli, learners never received feedback on the winter location, allowing us to measure generalization of the rule within a task.
Fig. 1: Task design.
a, The task consisted of mapping plant stimuli to their locations on a circular dial, across two contexts (summer and winter). Participants always responded with the probed plant’s summer location first, and then its winter location, receiving feedback after each response during training. b, Within task A, the relationship between each plant’s location in summer (white circle) and winter (black circle) corresponded to a fixed angular rule (for example, 120° clockwise) that was randomized across participants. c, In task B, all participants learned to map a new set of stimuli to their respective summer and winter locations. However, the rule defining the relationship between seasons differed across groups of participants. In the Same condition, the seasons for task B were related by the same rule previously learned in task A; in the Near condition, the rule shifted by 30°; in the Far condition, the rule shifted by 180°. d, All learners were trained on task A (120 trials), then task B (120 trials), and then retested on task A without feedback for winter. Transfer is defined as the change in winter accuracy from the final block (that is, one full stimuli cycle) of task A, to the first block of task B. If participants learn the rule, transfer should be better when the task B rule is more similar. Interference is defined as the probability of updating to the task B rule during retest of task A. For each participant, we trained a twinned neural network on the same stimuli sequence order and task rules. e, Networks consisted of feed-forward two-layer ANNs trained to associate sets of unique inputs (one-hot vectors; separate sets for each task) with the Cartesian coordinates of the winter and summer locations. Interference and transfer icons from OnlineWebFonts under a Creative Commons license CC BY 4.0.
After completing task A, learners were trained on task B, where they learned to map a new set of six stimuli to their corresponding summer and winter locations on the ring. Participants received no indication that a new task had begun, aside from the fact that the task B stimuli were novel. Learners were divided into three groups, corresponding to three levels of similarity between the rule in task A (rule A) and the rule in task B (rule B; Fig. 1c). Depending on the condition, rule B was either identical to rule A (Same), shifted by 30° (Near), or shifted by 180° (Far). For example, if the relationship between the seasons in task A was 120°, it remained 120° for the new task B stimuli in the Same group, shifted to either 90° or 150° in the Near group, and changed to 300° in the Far group. The rules themselves were matched across conditions (Supplementary Fig. 1). After training on task B, learners were retested on the locations of task A stimuli—this time receiving feedback only about their summer responses. This allowed us to investigate their performance of rule A at retest without feedback, by analysing winter responses.
Defining transfer and interference
In theory, learners could apply their knowledge of rule A to the novel stimuli in task B. This would manifest as using rule A to assign a winter location to a task B stimulus after receiving feedback about its summer location. Consequently, if learners apply prior knowledge, the more similar the task B rule, the better we expect initial performance on task B. Accordingly, we evaluate transfer in both humans and networks as the difference between the average winter accuracy for task A stimuli during their final presentation and the average winter accuracy for task B stimuli at their first presentation (Fig. 1d). Because we expect transfer to decrease with decreasing rule similarity, we expect the lowest transfer in the Far group.
Conversely, we predicted that successful transfer would come at the cost of greater interference from the new task. If Near group participants benefit more from transfer, this interference would manifest as greater use of rule B when retested on task A stimuli. Because no new rule learning occurs during retest, we could formally quantify interference as the probability of using rule B on return to task A. To measure this, we fit a mixture of von Mises distributions38 centred on rule A and rule B to learners’ rule responses (the offset between the winter response given, and the feedback for summer on the immediately previous trial). Higher interference corresponds to a higher probability weight of responding with rule B during retest of task A (Fig. 1d). As such, we measure interference from rule B in the Near and Far conditions where the rules change between tasks, but not in the Same condition where by definition the rules remain constant throughout. Parameter recoveries and model validation can be found in Supplementary Section 4.
ANN studies
To enable direct comparisons between humans and models, each human participant was paired with a twinned neural network that followed the exact same trial schedule—receiving the same ordering of stimuli and feedback as that participant. In the ANN experiments, we used two-layer feed-forward linear networks (Fig. 1e), allowing us to study how the representations supporting rule learning emerge through gradient descent32,39. During the task A and task B learning phases, networks were trained to map one-hot vectors representing the discrete input stimuli onto Cartesian coordinates for the winter and summer locations on the ring. Crucially, network weights were not reset between tasks to allow us to study continual learning. Similarity between task A and task B was manipulated identically as for humans, by varying the rule relating the target coordinates in summer and winter. To mirror the continuous, fully informative feedback received by humans, we trained networks with trial-wise gradient updates, using a mean squared error (MSE) loss. During retest of task A, model weights were updated after summer trials, but not winter trials, analogous to participants receiving feedback only for their summer responses. We chose two-layer linear networks as the simplest architecture capable of learning transferable shared structure in this task. By contrast, single-layer regression models trained on unique (one-hot) inputs cannot share weights across stimuli and are therefore incapable of transfer. Because the task is linearly solvable, linear networks are the most parsimonious choice for studying the representational dynamics supporting transfer and interference. However, we also confirm in supplementary analyses (Supplementary Section 2.1 and Supplementary Fig. 5) that the key behavioural effects also hold in ReLU networks, supporting the robustness of our findings beyond linear networks.
ANNs show higher transfer at the cost of greater interference when learning similar tasks
All ANNs achieved near-zero training loss on both task A and task B by the end of their respective training phases across all conditions (({\mathcal{L}} < 1{0}^{-3}); Fig. 2a–c).
Fig. 2: Transfer and Interference in ANNs.
ANNs were trained on participant-matched trial sequences (one network per participant). a–c, Learning curves for network in the three conditions. Each network is trained sequentially on task A followed by task B (full supervision with mean-squared error loss), and then retested on task A. During retest, model weights are not updated after winter trials (analogous to participants receiving feedback only for summer but not winter stimuli). a shows networks trained in the same condition (tasks with identical rules), b shows networks trained in the near condition (tasks with similar rules) and c shows networks trained in the far condition (tasks with opposite rules). Dashed lines show task change points, showing the introduction of task B stimuli and the return to task A stimuli, respectively. d, The number of principal components needed to capture 99% variance of the activity at the network’s hidden layer when exposed to all inputs. This is shown split by condition, both after training on only task A (purple) and after training on task B as well (green). e, Visualization of the two-dimensional representation of task stimuli at the network’s hidden layer, after training on task A stimuli. PCA (with two components) was performed on the network’s hidden layer activity when exposed to all inputs. f, Visualization of hidden layer stimuli representations after training on task B in the Same condition. g, The same as f after training the network to perform task B in the Near condition. h, The same as f after training the network to perform task B in the Far condition (see Supplementary Fig. 2 for additional visualizations of subspaces). i, Principal angles between task subspaces in the Same, Near and Far conditions. PCA (n = 2 components) was performed on ANN hidden layer activity for stimuli from task A versus task B, and the angle between subspaces computed. Larger angles indicate greater orthogonality between subspaces.
However, we observed that transfer and interference differed across levels of task similarity (Same, Near and Far rule conditions). First, we focus on transfer. In Fig. 3e, we show that initial accuracy for winter responses in task B is unimpaired in the Same condition, declines moderately in the Near condition and drops the most in the Far condition (Same > Far: t(202) = 605.79, P < 0.001, d = 85.25, 95% confidence interval (CI) 0.94–0.94; Near > Far: t(200) = 444.95, P < 0.001, d = 62.93, 95% CI 0.80–0.81; Same > Near: t(202) = 79.43, P < 0.001, d = 11.18, 95% CI 0.13–0.14). This shows that, despite the novel inputs during task B (one-hot vectors that were not seen during task A), networks can capitalize on their prior training, showing greater transfer when the task rules are more similar.
Fig. 3: Patterns of transfer and interference.
a–e, To study transfer, we examine the point when participants switch from task A to task B, encountering new stimuli. a, Histograms of the mean winter error across participants during the first block of task B in the Same condition (where the rule that links winter to summer stays the same). Purple and green notches mark expected error if applying the task A or task B rule, respectively. b, The same as a for the Near condition, where the rule shifts by 30°. While Near participants were randomly allocated to a task B rule +30° or −30° from the task A rule, here we flip the sign of errors for −30° participants for consistency, to visualize the biasing influence of the task A rule. c, The same as a for Far participants, who experience a new task B rule that is 180° from their previous task A rule. d, Transfer is defined as the cost of switching to the new task—the change in accuracy between the final winter responses for the six task A stimuli and the first winter responses for the six task B stimuli. Circles indicate mean, error bars show s.e.m. across the participant sample (Same N = 103, Near N = 101, Far N = 101), and colours correspond to condition. P values correspond to results of one-sided t-tests (Same > Far: t(202) = 9.48, P < 0.001, d = 1.33, 95% CI 0.19–0.29; Near > Far: t(200) = 6.63, P < 0.001, d = 0.93, 95% CI 0.13–0.23). ***P < 0.001. e, Transfer in ANNs trained on participant-matched schedules. f–j, To study interference, we examine how participants perform when returning to task A after completing task B. f, Histograms of retest error on task A (coloured) overlaid on task A training error (grey) in the Same condition. Little change suggests minimal interference. g, The same as f for the Near condition. A shift towards rule B indicates interference from the recently learned task. h, The same as f for the Far condition. In this case, very few participants respond using rule B during retest. i, Interference is quantified as the probability of using rule B when retested on task A, modelled using a von Mises mixture where 1 indicates full use of rule B and 0 indicates use of rule A. Circles indicate mean, error bars show s.e.m. across participants (Near N = 80, Far N = 94; participants who failed to learn task B were excluded), and colours correspond to condition. P values correspond to results of one-sided t-test (Near > Far: t(172) = 3.44, P < 0.001, d = 0.53, 95% CI 0.08–0.31). ***P < 0.001. j, Interference in ANNs trained on participant-matched schedules. Colours correspond to training condition. Note that human data are aggregated across the discovery and replication samples (for data plotted by sample, see Supplementary Fig. 6). Interference and transfer icons from OnlineWebFonts under a Creative Commons license CC BY 4.0.
Next, we examined interference, measured as the probability of incorrectly applying rule B upon return to task A. After training on both tasks, networks in the Near condition applied rule B to task A stimuli, showing truly catastrophic forgetting of the initial rule (Fig. 3j). By contrast, networks trained in the Far condition showed no interference: they were able to successfully return to using rule A.
Why would networks trained in different similarity conditions show such different patterns of interference? An initial clue comes from observing the learning curves as the networks were trained on task B. In the Same and Near groups, task B training resulted in rapid exponentially decreasing loss (Fig. 2a,b), as learning unfolds along an already established subspace. By contrast, loss curves in the Far group exhibited an initial plateau (Fig. 2c), appearing qualitatively similar to the curves observed when the networks were initially learning task A. One possibility is that this plateau occurs during the weight modifications that allow for learning to unfold in a new subspace.
To test this hypothesis, we examined the dimensionality of hidden representations over the course of learning, using principal component analysis (PCA). The dimensionality did not change as a result of learning task B in the Same and Near conditions (that is, the same number of components could explain 99% of variance in the hidden layer representation of all inputs). By contrast, the dimensionality doubled after learning task B in the Far condition, supporting the idea that a new subspace was formed (Fig. 2d). Indeed, visualization of the hidden representations in these networks implied that in the Same or Near condition the network reused the same subspace across the two tasks (Fig. 2e–g). However, networks trained on highly dissimilar rules in the Far condition learned the new task in a separate, orthogonal subspace (Fig. 2h).
Finally, we formally quantified the relationship between the subspaces each network used to represent the two tasks. We measured the principal angle between the two-dimensional subspaces encoding task A stimuli and task B stimuli after each network was fully trained40,41,42 (Fig. 2i). In networks learning the Same or Near tasks, this angle was 0°, indicating use of the same subspace across tasks. By contrast, the principal angle was 90° in networks trained in the Far condition, indicating use of an orthogonal subspace for the new task. This explains the slower learning of task B but preserved performance of task A at retest.
Human studies
Next, we looked at whether humans showed similar patterns of transfer and interference as a function of task similarity. For the human experiments, we recruited separate groups of healthy online participants for each condition, across independent discovery and replication studies (discovery sample: Near, N = 50, Far, N = 50, Same, N = 52; replication sample: Near, N = 51, Far, N = 51, Same, N = 52).
Humans also show higher transfer but more interference when learning similar tasks
Human participants were able to attain high accuracy in this study across all conditions (see Supplementary Fig. 8 for winter and summer accuracy over the course of learning; average winter accuracy in final block of task A; Same: mean (M) = 0.81, s.e.m. = 0.12, Near: M = 0.85, s.e.m. = 0.10, Far: M = 0.82, s.e.m. = 0.11; average winter accuracy in final block of task B; Same: M = 0.84, s.e.m. = 0.13, Near: M = 0.87, s.e.m. = 0.14, Far: M = 0.82, s.e.m. = 0.16). However, patterns of transfer and interference followed the same trends observed in neural networks, depending on task similarity.
First, we examined transfer among human participants. When introduced to the new stimuli in task B, participants in the Same condition were able to infer the correct winter locations by reapplying the previously learned rule, shown by their response errors clustering around zero (Fig. 3a). A similar pattern is observed in the Near condition, although response errors are systematically biased toward the previously learned rule (Fig. 3b). By contrast, errors in the Far condition are widely distributed, indicating that participants were unable to reuse their previous rule, instead learning the new task from scratch (Fig. 3c). Similar to ANNs, human participants in the Same and Near groups therefore showed greater transfer to task B than in the Far condition (Fig. 3d; one-way analysis of variance for effect of condition on transfer; discovery sample: F(2, 148) = 18.69, P < 0.001, η2 = 0.20; replication sample: F(2, 151) = 29.34, P < 0.001, η2 = 0.28. Δ accuracy in the Far condition was significantly lower than the Near and Same condition; Far < Same one-sided t-test: t(99) = 6.12, P < 0.001, d = 1.23, 95% CI 0.15–0.29 (discovery sample); t(101) = 7.23, P < 0.001, d = 1.43, 95% CI 0.19–0.33 (replication sample); Far < Near one-sided t-test: t(98) = 3.85, P < 0.001, d = 0.78, 95% CI 0.07–0.23 (discovery sample); t(100) = 5.52, P < 0.001, d = 1.10, 95% CI 0.13–0.28 (replication sample). This shows that participants were able to successfully infer the task rules, and benefit from transfer to task B when rules remained similar. Importantly, the pattern of switch costs that we observe is better explained by participants transferring their previous rule to the new task B stimuli, rather than alternative behavioural strategies such as responding randomly or repeating their summer location feedback (Supplementary Fig. 9).
Next, we measured interference from task B when participants were retested on task A. Our theory concerns interference occurring as a result of new learning, so participants who failed to learn task B were excluded from interference analyses (14% participants excluded). Participants in the Same condition showed response errors tightly clustered around zero, reflecting consistent use of the original rule, which remained unchanged throughout task B (Fig. 3f). However, many participants in the Near condition shifted towards applying rule B at retest (Fig. 3g), while participants in the Far condition largely maintained rule A rather than shifting to rule B (Fig. 3h). Quantifying this formally, we found that Near group participants showed higher interference after learning task B compared with those in the Far condition—in other words, they were more likely to misapply rule B during retest (Fig. 3i; p(rule B) in Near > Far, one-sided t-test; t(86) = 2.56, P = 0.006, d = 0.55, 95% CI 0.04–0.37 (discovery sample); t(84) = 2.27, P = 0.013, d = 0.50, 95% CI 0.02–0.35 (replication sample); see Supplementary Section 4 for further detail on model validation and parameter recoveries, and Supplementary Fig. 7 for effects on retest accuracy).
Taken together, these results support the idea that humans and neural networks show similar patterns of transfer and interference, with the same systematic dependency on task similarity. In neural networks, we can see that learning tasks of intermediate similarity promotes shared representations, manifesting in higher transfer across tasks at the cost of greater interference. By contrast, learning highly dissimilar tasks leads to less transfer but lower interference between tasks. We find these patterns of trade-offs between the benefits of transfer and avoidance of interference are preserved across the two learning systems.
Individual differences in transfer and interference
Although participants learning similar tasks generally showed more interference than those learning dissimilar tasks, this pattern was not universal: many individuals in the Near group showed little to no interference. In fact, interference weights in this group were bimodally distributed (Fig. 4a), suggesting the presence of two distinct learning strategies. Some individuals appeared to overwrite rule A with rule B, while others returned to rule A, effectively avoiding interference (Fig. 4b). In the context of our theory, this naturally leads to the question of whether these individual differences are also characterized by a trade-off between benefitting from transfer and avoiding interference. We predicted this may reflect differences in how participants approached the task structure, with some participants merging across tasks based on shared structure (lumpers), and others focusing on the differences between stimuli (splitters). To study this phenomenon further, we used a model-based approach to classify lumpers and splitters. Participants whose responses during retest of task A were best fit by rule A were categorized as splitters, while those whose responses were best fit by rule B were categorized as lumpers. In our cohort, 47.5% of Near-group participants were lumpers.
Fig. 4: Individual differences in transfer and interference.
a, In the Near condition, interference at retest is bimodally distributed. Participants in the Near group were classified into splitters (those with low interference from task B) and lumpers (those with high interference from task B). b, A histogram of all winter retest errors for splitters (light blue) and lumpers (dark blue). Lines show the posterior model fits, computed using the average concentration (κ) and mixture weight (π) parameters across participants in each group. c–h, On the left (in blue), we plot behavioural data from the splitters and lumpers. On the right (in grey), we plot data from ANNs trained under a lazy learning regime (forming unstructured, high-dimensional task solutions), versus trained under a rich learning regime (forming structured, low-dimensional task solutions). In each plot, circles show mean metrics in each group, dots show individual data points and error bars show s.e.m. (splitters: N = 42, lumpers: N = 38). P values correspond to results of two-sided t-tests. c, Interference among splitters and lumpers is plotted for illustrative purposes only (because this metric determines the classification), for comparison with interference in lazy and rich ANNs (ANNs: t(200) = 32.8, P < 0.001, d = 4.64, 95% CI 0.75–0.84). d, Transfer performance in the groups, as defined throughout as the change in winter accuracy between the final exposure to task A stimuli and the first exposure to task B stimuli (humans: t(78) = 3.95, P < 0.001, d = 0.89, 95% CI 0.08–0.23; ANNs: t(200) = 20.97, P < 0.001, d = 2.97, 95% CI 0.24–0.29). e, Generalization accuracy is the average winter accuracy for the test stimulus in task A, for which feedback about winter is withheld throughout. Because participants only receive feedback about its summer location, they must infer the correct winter location by generalizing their knowledge of the task A rule (humans: t(78) = 2.74, P = 0.008, d = 0.62, 95% CI 0.03–0.21; ANNs: t(200) = 12.72, P < 0.001, d = 1.80, 95% CI 0.30–0.40). f, Average accuracy for summer responses, which must be remembered for each stimulus separately (in contrast to winter responses, which can be inferred by applying the rule to the summer feedback). This requires participants to discriminate the unique stimuli. ANN performance is shown for the first 120 trials of task A training, to match the length of human training. For full accuracy trajectories over time, including later stages of training, see Supplementary Fig. 15 (humans: t(78) = 3.40, P < 0.001, d = 0.76, 95% CI 0.03–0.11; ANNs: t(200) = 3.60, d = 0.51, P < 0.001, 95% CI 0.02–0.07). g, At the end of the study, participants were asked to recall when they saw each stimulus for the first time (at the beginning of the study, or halfway through). In other words, this reflects the ability to explicitly report the onset of unique task stimuli (humans only: t(78) = 3.69, P < 0.001, d = 0.81, 95% CI 6.5–22.8). h, Representational similarity between task A and task B stimuli in ANNs, quantified as the principal angle between their respective hidden layer subspaces after task B training. Rich networks collapse the representations onto the same subspace, while lazy networks retain greater distinction between representations (ANNs only: t(200) = 125.50, P < 0.001, d = 17.75, 95% CI 73.4–75.8). **P < 0.01, ***P < 0.001 (c–h). Credit: interference and transfer icons from OnlineWebFonts under a Creative Commons license CC BY 4.0.
If the increased interference observed in lumpers (Fig. 4c, left) arose from a focus on shared structure, we would expect lumpers to demonstrate better transfer compared with splitters. Indeed, we found that lumpers were better at switching to task B, benefitting from similarities between the two tasks (Fig. 4d, left; transfer: splitters: M = −0.21, s.e.m. = 0.03, lumpers: M = −0.06, s.e.m. = 0.03; two-sided t-test: t(78) = 3.95, P < 0.001, d = 0.89, 95% CI 0.08–0.23). In addition, if lumpers were good at capitalizing on shared task structure, we would expect lumpers to successfully generalize the rule to untrained stimuli within task A. To test this, we leveraged a feature of our experimental design: for one ‘test’ stimulus in task A, feedback was not provided for winter responses, allowing us to measure generalization. We found that lumpers indeed exhibited higher accuracy for the test stimulus, demonstrating greater within-task generalization (Fig. 4e, left; splitters: M = 0.69, s.e.m. = 0.04, lumpers: M = 0.81, s.e.m. = 0.03; two-sided t-test: t(78) = 2.74, P = 0.008, d = 0.62, 95% CI 0.03–0.21). In other words, individuals who experienced more interference were better at extending their knowledge to new situations—both when learning task B as well as when inferring untrained responses within a task. This is consistent with our theory that lumpers are relying more on shared representations during learning.
Could lumpers be performing better on these metrics simply as a result of higher task engagement? If this were the case, we would expect them to show generally higher accuracy across the board. To address this possibility, we assessed participants’ accuracy for the ‘summer’ response during task A—the initial phase of the experiment. Due to the sequential nature of each trial (participants are always probed on summer before winter), summer accuracy reflects the ability to recall the unique, memorized location of each stimulus, whereas winter accuracy can be inferred by applying the rule to the summer location. We found that lumpers—while achieving higher accuracy in transfer and generalization—were significantly worse than splitters at remembering the unique summer positions (Fig. 4f, left; splitters: M = 0.671, s.e.m. = 0.015; lumpers: M = 0.600, s.e.m. = 0.015; two-sided t-test: t(78) = 3.40, P = 0.001, d = 0.76, 95% CI 0.03–0.11). Nota