Main
The medial temporal lobe (MTL), and particularly the hippocampus, is essential for declarative memories of items in context1,2. In humans, the hippocampus has also been implicated in generalization[6](https://www.nature.com/articles/s41586-025-09910-2#ref-CR6 “Zeithamova, D. & Bowman, C. R. Generalizati…
Main
The medial temporal lobe (MTL), and particularly the hippocampus, is essential for declarative memories of items in context1,2. In humans, the hippocampus has also been implicated in generalization6,7. With ease, we recall multiple memories involving a particular memory item, for example, different dinners with our friend. A uniquely human single-neuron correlate of memory items is represented by concept neurons in the MTL, which selectively and invariantly respond to stimuli containing a preferred semantic concept8 (for example, our friend). Concept neurons are apparently not modulated by the context of an item, and thus seem suited for generalization5 (but see ref. 9). As humans can also recall items based on context, concept neurons may be complemented by neurons representing context. However, remembering contexts such as time, task or location together with memory items requires the combination of item and context information10. For example, we remember when we had dinner with our friend, whether it was work related or where it took place. Furthermore, we can recall a specific restaurant visit based on item and context information. Initial evidence has suggested that separate neurons in the MTL represent contexts11,12,13. However, it remains unclear how neural correlates of item and context memory are combined to form or retrieve integrated item-in-context memories at the single-neuron level in humans. Recording from 16 patients undergoing surgery to treat epilepsy, we investigated how human MTL neurons combine item and context representations. We devised a task in which pairs of pictures were presented sequentially following different contexts, namely, questions (for example, ‘Bigger?’) that specified how to compare the two pictures. This required participants to remember items (that is, two pictures) in a specific context. Here ‘context’ refers to interactive or task context rather than independent or background context14. Only a small fraction of neurons encoded individual item–context combinations as reported in rodents. Instead, separate populations of neurons represented either items or contexts in an orthogonal encoding scheme, reflecting the ability of humans to generalize memories along each dimension separately. Finally, we show how item and context information is combined via co-activation, synaptic modification or reinstatement, contributing to item-in-context memory and the retrieval of contextually relevant item memories.
Distinct neurons for content and context
During 49 experimental sessions, we recorded from 3,109 neurons in the amygdala, parahippocampal cortex, entorhinal cortex and hippocampus of 16 neurosurgical patients implanted with depth electrodes for invasive seizure monitoring. Pairs of pictures were presented on a laptop screen in different contexts, namely, five questions (‘Bigger?’, ‘Last seen in real life?’, ‘Older?’ or ‘More expensive?’, ‘Like better?’ and ‘Brighter?’) that specified how the pictures should be compared (Fig. 1a). For each session, four pictures eliciting selective neuronal responses in a previous screening were selected. Each trial began with a context-providing question (such as Bigger?), followed by a sequence of two of the four pictures (stimuli) that needed to be remembered in context and compared accordingly. Patients then chose the picture that best answered the question (for example, which depicted something bigger) and indicated whether it was shown first or second via the keyboard. Most answers were highly consistent and transitive, with performance greatly exceeding chance in all but one excluded session and not differing across questions (P = 0.083; Kruskal–Wallis test; Extended Data Fig. 1). Furthermore, answers strongly correlated with ground truths for Bigger? and More expensive? or Older? (mean ρ = 0.760, P = 6.47 × 10−10; Extended Data Fig. 2). To quantify the representation of stimulus and context during picture presentations, repeated-measures two-way 4 × 5 analysis of variance (ANOVA) with factors picture, question and their interaction were computed for each neuron (α = 0.001; effect sizes in Fig. 2e).
Fig. 1: Neurons in the MTL encode stimulus and context during picture presentations.
Although most stimulus neurons only encode picture identity, context is either represented alone, in conjunction with stimulus identity or as modulation of stimulus-specific firing rates. a, Paradigm with exemplary trial of the stimulus-comparison task (here for the context question Bigger?). During 300 trials, participants had to compare pictures according to a context question. Each trial contained one of five questions (Bigger?, Last seen (in real life)?, More expensive? or Older?, Like better? and Brighter?), a sequence of two pictures (out of four) and an answer prompt displaying ‘1 or 2?’. Participants indicated the sequential position of the picture that best answered the question by pressing keys 1 or 2. Question and answer screens were self-triggered. Event durations are printed underneath. b–e, Example neurons with spike density plots in the upper left whose firing rate (Hz) during picture presentations (0–1,000 ms) is visualized by raster plots and histograms as a function of question and stimulus identity. b, A stimulus neuron that selectively increased firing whenever a biscuit was shown, irrespective of which question context was presented. The neuron did not respond to any other picture. c, This context neuron increased firing during picture presentations whenever the trial started with the question Older? (top left) as opposed to any other question (bottom left) regardless of stimulus identity (top row). d, Example of a stimulus–context interaction neuron whose firing increased when the question was Brighter image? and the stimulus depicted a train (top left) in contrast to other combinations of question and stimulus. e, This contextual stimulus neuron only responded to its preferred picture (hamburger, top left), but more strongly in the context of the question Last seen (in real life)?.
Fig. 2: Stimulus and context are mainly encoded separately and rarely conjunctively.
a, Venn diagram of a significant number of neuron types defined by repeated-measures two-way ANOVA with factors stimulus (stimulus neurons in red, ochre, grey or pink), question (context neurons in green, ochre, grey or light blue) or their interaction (stimulus–context interaction neurons in grey, pink, blue or light blue) during picture presentations (α = 0.001) from 3,109 neurons. Intersections denote multiple effects. Most stimulus neurons were not modulated by context but overlapped with context neurons (contextual stimulus neurons) or stimulus–context interaction neurons. b, Probabilities of context or stimulus–context interaction neurons across brain regions (absolute numbers above the bars). The significance asterisks denote binomial tests versus chance (dotted line). Neurons in all MTL regions were strongly modulated by context and stimulus–context. A, amygdala; EC, entorhinal cortex; H, hippocampus; PHC, parahippocampal cortex. c, Analogous to panel b with probabilities of context conditioned on stimulus modulation (stimulus | no stimulus). The significance asterisks denote Fisher’s exact test results from regional contingency tables. Stimulus neurons were more likely to be modulated by context (contextual stimulus neurons) than non-stimulus neurons in all MTL regions (significant in amygdala and hippocampus), especially the hippocampus. d, Analogous to panel c, with probabilities of significant interactions (stimulus–context). Stimulus–context interaction neurons with conjunctive representations were most frequent in the hippocampus. NS, not significant. e, Boxplots of patient-averaged effect sizes from all 3,109 neurons (({\eta }_{{\rm{partial}}}{2})) for factors stimulus, context and interaction (stimulus–context). Data (purple) are compared with stratified-shuffled controls (grey; Wilcoxon signed-rank test). Effect sizes markedly exceeded chance for stimulus (({\eta }_{p}{2})(real) = 0.114, ({\eta }_{p}{2})(control) = 0.034, Ppatient = 1.313 × 10−3, Psession = 3.330 × 10−9); and context (({\eta }_{{\rm{p}}}{2})(real) = 0.059, ({\eta }_{{\rm{p}}}{2})(control) = 0.034, Ppatient = 1.313 × 10−3 and Psession = 4.827 × 10−9). The difference for stimulus–context was smaller but significant (({\eta }_{{\rm{p}}}{2})(real) = 0.036, ({\eta }_{{\rm{p}}}{2})(control) = 0.034, Ppatient = 1.840 × 10−2 and Psession = 3.140 × 10−3). The insets depict scatter plots of ({\eta }_{{\rm{p}}}{2}) of context (left) or stimulus–context (right) versus stimulus. The dashed lines indicate the smallest significant effect size (α = 0.001). ****P < 0.0001, **P < 0.01 and *P < 0.05 (Bonferroni corrected).
Although we expected to find many neurons with a significant main effect of stimulus due to our pre-selection of response-eliciting pictures, it was unclear how context would be encoded. Distinct neuronal populations were identified. Most neurons only exhibited a significant main effect of either stimulus (Fig. 1b) or context (Fig. 1c) and were termed (mere) stimulus and context neurons, respectively. Figure 1b shows an example of a (mere) stimulus neuron, selectively increasing firing to ‘biscuit’, irrespective of context (that is, comparison prompted by the question). Figure 1c shows a (mere) context neuron, whose firing distinguished only context during the late phase of picture presentations. Whenever the trial started with Older?, but not in any other context, this neuron responded to all pictures regardless of stimulus identity. Furthermore, we identified neurons with a significant interaction that encoded stimulus and context conjunctively (Fig. 1d). These neurons only responded when a specific picture (for example, train) was shown in a specific context (for example, Brighter image?), but not for any other picture–context combination. Finally, although mere stimulus (MS) neurons (only significant main effect of stimulus) were much more common (Fig. 1b), we also identified contextual stimulus neurons defined by significant main effects of both stimulus and context (irrespective of interaction), for example, selectively responding to the picture of a hamburger, yet particularly strongly when the context was Last seen in real life? (Fig. 1e).
Distribution of content and context
Context was represented during picture presentations in substantially more neurons than expected by chance. Out of 3,109 recorded units, 200 exhibited a main effect of question P < 0.001 in a repeated-measures two-way ANOVA with factors stimulus and question (including interaction; Fig. 2a), largely exceeding the expected number of roughly 3 units (two-sided binomial test, P < 2.225 × 10−308). This amounted to 2.95% of neurons in the amygdala (31 of 1,051), 7.68% in the parahippocampal cortex (37 of 482), 5.68% in the entorhinal cortex (25 of 440) and 9.42% in the hippocampus (107 of 1,136), all far exceeding chance (Fig. 2b; two-sided binomial tests, all P < 2.225 × 10−308). Although most neurons that encoded stimulus did not encode context (524 of 597), the fraction that did or that exhibited an interaction was still highly significant in two-sided binomial tests: out of the 597 neurons with a main effect of stimulus, 73 also showed a main effect of question with P < 0.001 (12.23%; P < 2.225 × 10−308) and 31 an interaction with P < 0.001 (5.19%; P < 2.225 × 10−308). Although the populations were largely distinct, a significant main effect of context was more prevalent among neurons with than without an additional main effect of stimulus, particularly in the amygdala (7.83% versus 1.67%), the entorhinal cortex (10.29% versus 4.84%) and the hippocampus (21.26% versus 7.28%; Fig. 2c). Mere context (MC) neurons (only significant main effect of context) were most frequent in the parahippocampal cortex (6.96%, 25 of 359) and rarest in the amygdala (1.67%, 14 of 836). Moreover, significant interactions (P < 0.001) capturing conjunctive representations strongly exceeded chance in two-sided binomial tests (1.61%, 50 of 3,109; P < 2.225 × 10−308), particularly in the amygdala (1.05%, 11 of 1,051; P = 1.588 × 10−8), the parahippocampal cortex (2.28%, 11 of 482; P = 4.733 × 10−12) and the hippocampus (2.29%, 26 of 1,136; P < 2.225 × 10−308; Fig. 2b), especially among hippocampal stimulus neurons (10.34%, 18 of 174; Fig. 2d). To account for potential statistical dependencies between factors or neurons, ANOVA effect sizes of all neurons were compared with stratified label-shuffling controls, irrespective of significance (Fig. 2e; see Methods). Confirming previous results, patient-averaged effect sizes of stimulus (({\eta }_{{\rm{p}}}{2})(real) = 0.114, ({\eta }_{{\rm{p}}}{2})(control) = 0.034, Ppatient = 1.313 × 10−3 and Psession = 3.330 × 10−9), context (({\eta }_{{\rm{p}}}{2})(real) = 0.059, ({\eta }_{{\rm{p}}}{2})(control) = 0.034, Ppatient = 1.313 × 10−3 and Psession = 4.827 × 10−9) and their interaction (({\eta }_{{\rm{p}}}{2})(real) = 0.036, ({\eta }_{{\rm{p}}}{2})(control) = 0.034, Ppatient = 1.840 × 10−2 and Psession = 3.140 × 10−3) all significantly differed from controls (two-sided Wilcoxon signed-rank tests, Bonferroni corrected). Linear mixed-effects models with a fixed intercept (capturing grand population effect size differences) and random intercepts for patients, fitted across all recorded neurons, confirmed that real effect sizes significantly exceeded permutation-based baselines for all three ANOVA factors: stimulus (β = 0.080, 95% CI 0.065–0.095, P = 2.177 × 10−24), context (β = 0.025, 95% CI 0.019–0.032, P = 4.186 × 10−14) and stimulus–context interaction (β = 0.002, 95% CI 0.001–0.003, P = 2.269 × 10−6). Finally, we examined whether stimulus–context neurons were more prevalent in areas with higher proportions of stimulus or context neurons (Extended Data Fig. 3). Across all session–site combinations (n = 168), stimulus–context neuron proportions correlated significantly with both stimulus (Spearman’s ρ = 0.177, P = 0.022) and context neuron proportions (ρ = 0.487, P < 0.001), particularly in the hippocampus and amygdala, where stimulus–context proportions correlated with both stimulus (ρ = 0.274, P = 0.032 for the hippocampus; and ρ = 0.329, P = 0.012 for the amygdala) and context neuron proportions (ρ = 0.471, P < 0.001 for the hippocampus; and ρ = 0.487, P = 0.003 for the amygdala). By contrast, the parahippocampal and entorhinal cortices showed significant correlations exclusively with context (ρ = 0.569, P < 0.001 for the parahippocampal cortex; and ρ = 0.377, P = 0.023 for the entorhinal cortex) but not with stimulus neuron proportions.
Abstract representations reflect task
Our next analyses sought to address whether all contexts can be decoded, how well context representations generalized across different pictures or across time, and how they were related to stimulus representations. For each session, we computed the support vector machine (linear SVM) decoding accuracy of context during picture presentations (100–1,000 ms). Decoders were trained with either all neurons or all context neurons of each session (Fig. 3a). Confirming previous ANOVA results, decoding accuracies significantly exceeded chance levels of 0.2 (one out of five questions, two-sided Wilcoxon signed-rank test, uncorrected) for all contexts when all neurons were included (Psession = 1.981 × 10−3 and Ppatient = 4.094 × 10−3 for Bigger?; Psession = 9.142 × 10−7 and Ppatient = 6.430 × 10−4 for Last seen?; Psession = 5.187 × 10−3 and Ppatient = 3.861 × 10−2 for Older? or More expensive?; Psession = 1.456 × 10−3 and Ppatient = 6.624 × 10−3 for Like better?; Psession = 8.016 × 10−8 and Ppatient = 4.378 × 10−4 for Brighter?; and Psession = 4.378 × 10−4 and Ppatient = 4.378 × 10−4 for all contexts). Similar results were obtained when restricting these analyses to context neurons (Psession = 1.713 × 10−2 and Ppatient = 4.938 × 10−2 for Bigger?; Psession = 5.888 × 10−5 and Ppatient = 4.378 × 10−4 for Last seen?; Psession = 3.875 × 10−2 and Ppatient = 4.08 × 10−1 for Older? or More expensive?; Psession = 8.126 × 10−3 and Ppatient = 4.373 × 10−2 for Like better?; Psession = 9.826 × 10−7 and Ppatient = 4.368 × 10−4 for Brighter?; and Psession = 4.368 × 10−4 and Ppatient = 4.368 × 10−4 for all contexts). Although session-wise decoding accuracies were highest for Last seen? (0.292) and Brighter image? (0.303), they significantly exceeded chance for all contexts (0.229 for Bigger?, 0.234 for Older? or More expensive? and 0.257 for Like better?).
Fig. 3: Generality and temporal dynamics of context representations imply relevance for stimulus processing.
Abstract context representations invariant to picture identity or position combine with content representations across temporal gaps during picture presentations via context reactivation until context-dependent decisions. a, Population SVM-decoding accuracies of context (16 patients) during picture presentations comparing different questions (colours) using all versus only context neurons. b, Pooled context (green) or stimulus (red) decoding accuracies from context (left) and stimulus (right) neurons, for cross-validation or generalization across pictures (red), contexts (green) and serial picture positions (blue). c, A heat map of context decoding across time. The green diagonal line denotes identical training or testing times (see panel d). The dashed lines show event onsets (white) and offsets (yellow). The boxes indicate times analysed in panel e. d, Patient-wise context (top; green) or stimulus decoding (bottom; picture 1 or picture 2 in grey or lavender, respectively) and label-shuffled controls (red). Data are mean ± s.e.m. (solid lines and shaded areas, respectively), with dashed lines indicating chance. Significant differences (cluster permutation test, P < 0.01) from controls (top) or chance (below) are shown as solid lines below. Context decoding exceeded chance throughout, peaking after stimulus representations and remaining elevated. e, Patient-wise context-decoding accuracies trained with context neuron activity during questions (400 neurons per region) or baseline, and decoded during late first (red) or second (blue) picture presentations (boxes in panel c). The dashed line indicates chance. Question activity was reactivated during pictures, especially in the hippocampus. f, As in panel d, but before context-dependent decisions. Both context and content representations exceeded chance (dashed lines) until decisions (red line). CN, context neuron; SN, stimulus neuron. For the boxplots (a,b,e), quantile 1, median, quantile 3 and whiskers (points within ±1.5× the interquartile range) are shown, with the asterisks denoting differences from zero (Wilcoxon signed-rank test, uncorrected) or each other (Mann–Whitney U-test, uncorrected). ****P < 0.0001, ***P < 0.001, **P < 0.01 and *P < 0.05.
Next, we assessed decoding performances from neurons pooled across sessions. To estimate their variance, random subsamples were drawn (see Methods). Pooled analyses yielded highly significant context (0.866, P = 1.708 × 10−6; green) and stimulus (0.995, P = 1.627 × 10−6; red) decoding accuracies (two-sided Wilcoxon signed-rank test against 0.2 or 0.25, uncorrected; Fig. 3b). In addition, we quantified generalization across stimuli, contexts or serial picture positions. When training and decoding contexts across different stimuli (0.764, *P *= 1.727 × 10−6) or picture positions (0.791, P = 1.716 × 10−6), accuracies remained highly significant. The same was true when stimuli were decoded across different contexts (0.995, P = 1.538 × 10−6) or picture positions (0.994, P = 1.638 × 10−6).
Subsequently, we computed context-decoding accuracies from binned population activity across different training and decoding periods during the trial (Fig. 3c; see Methods). Starting approximately 400 ms after question onset, context-decoding accuracies markedly exceeded chance (more than 20%; light blue or warmer colours) for most of the trial, peaking during question and late picture presentations. Consistent with previous analyses (Fig. 3b), training during first-picture presentations resulted in high decoding accuracies during second-picture presentations and vice versa, confirming generalized context representations. Figure 3d depicts patient-wise context-decoding accuracies of context neurons (top; green) for identical training and decoding times (green diagonal line in Fig. 3c) together with stimulus-decoding accuracies of pictures 1 and 2 by stimulus neurons (bottom; grey and lavender). Context-decoding accuracies (Fig. 3d, top) exceeded chance for the entire duration of the trial (solid green line; two-sided cluster permutation test, P < 0.01), even compared with label-shuffled controls (red line; two-sided cluster permutation test, P < 0.01). In parallel, stimulus neurons significantly encoded currently depicted pictures with high accuracies (Fig. 3d, bottom), as well as previous or even upcoming pictures (solid lines in grey and lavender). The latter was possible as second pictures could be inferred stochastically by design (probability of one-third instead of one-quarter due to no repetition). Of note, neural representations of first pictures were reactivated during second-picture presentations in all contexts (Extended Data Fig. 4), confirming that stimulus neurons encoded contents rather than question-related features. Furthermore, visual stimuli and contexts could be decoded simultaneously during picture presentations (first stimuli, then contexts) and with high accuracy even within single sessions (Extended Data Fig. 5). Context decoding across brain regions is shown in Extended Data Fig. 6. During question presentations, context decoding significantly exceeded chance (two-sided cluster permutation test, P < 0.01) in the parahippocampal cortex (turquoise), the entorhinal cortex (violet), the hippocampus (light blue) and even in the amygdala (mustard; P < 0.05). Context decoding was most accurate in the hippocampus (blue), particularly during picture presentations, exceeding chance in all four MTL regions.
Next, we determined whether activity patterns during question presentations were recapitulated during picture presentations (Fig. 3e). Session-wise SVM decoders were trained with regional context neuron activity either during question presentations (600–1,000 ms) or preceding baseline intervals (−500 to 100 ms), whereas contexts were decoded in the late phases of first (red; 600–1,000 ms) or second (blue; 600–1,000 ms) picture presentations. Training and decoding intervals correspond to the coloured boxes in Fig. 3c. To account for regional differences in the number of recorded neurons (1,051 in the amygdala, 482 in the parahippocampal cortex, 440 in the entorhinal cortex and 1,136 in the hippocampus), 400 neurons were randomly sampled per region. Decoders trained during questions, but not during preceding baselines (Fig. 3e, far left), significantly distinguished contexts during both subsequent picture presentations (Fig. 3e, right side; Wilcoxon signed-rank test, uncorrected) in the amygdala (Apicture 1: P = 1.953 × 10−2 and Apicture 2: P = 4.883 × 10−2), the entorhinal cortex (ECpicture 1: P = 3.906 × 10−3 and ECpicture 2: P = 5.469 × 10−2) and the hippocampus (Hpicture 1: P = 1.953 × 10−3 and Hpicture 2: P = 6.601 × 10−4), but not in the baseline control condition or in the parahippocampal cortex (BLpicture 1: P = 7.057 × 10−1 and BLpicture 2: P = 1; and PHCpicture 1: P = 4.316 × 10−1 and PHCpicture 2: P = 4.922 × 10−1).
Finally, we tested whether content and context representations of stimulus and context neurons persisted until the end of the trial when patients made context-dependent picture choices (Fig. 3f). Both context and picture contents could be decoded above chance until a decision was indicated (two-sided cluster permutation test, P < 0.01), with significantly enhanced decoding when choices matched ground truth (P = 0.0128 for context and P = 0.0351 for stimulus, one-sided Wilcoxon signed-rank test; Extended Data Fig. 7).
Sequential activation arises during task
To assess whether pairing stimuli and contexts induced neuronal interactions or synaptic modifications reflecting stimulus–context associations, we computed shift-corrected, trial-by-trial cross-correlograms (see Methods) from pairs of mere (single main factor P < 0.001 in repeated-measures two-way ANOVA) stimulus and MC neurons during picture presentations from different MTL regions (Extended Data Figs. 8 and 9).
Only for pairs of neurons from the entorhinal cortex and the hippocampus of the same hemisphere (Fig. 4a and Extended Data Fig. 8) did firing of MS neurons (entorhinal MSs) predict firing of MC neurons (hippocampal MCs) after approximately 40 ms, as evidenced by asymmetric shift-corrected cross-correlograms (blue, peak on the left), but not that of other neurons (hippocampal ON; black, no prominent peaks). Cross-correlograms differed significantly between groups (that is, MS–MC versus MS–ON) on short timescales (−131 to −17 ms; two-sided cluster permutation test, P < 0.01 for neural pairs and P < 0.05 for sessions; Fig. 4a and Extended Data Fig. 8), but not for pairs from different regions (Extended Data Fig. 8) or across hemispheres (Extended Data Fig. 9). This is consistent with unidirectional synaptic changes reflecting stimulus–context associations. Of six sessions from four participants with simultaneous recordings of entorhinal stimulus and hippocampal context neurons, three (50%) from two participants (50%) exhibited significant cross-correlation peaks around −40 ms (cluster permutation test, P < 0.05).
Fig. 4: Emergence of sequential stimulus and context neuron activity suggests a storage mechanism.
a, Mean cross-correlograms (six sessions) during picture presentations between entorhinal MS and either hippocampal MC neurons (each with only one significant main effect, blue) or a matched number of other hippocampal neurons (non-significant, black; left). Data are mean ± s.e.m. (solid lines and shaded areas, respectively). The peak lag time is shown in the top left. The red horizontal lines indicate significant differences (P < 0.05, cluster permutation test). Stimulus–neuron firing in the entorhinal cortex predicted hippocampal context–neuron, but not other neuron firing after approximately 40 ms. The right panel is the same as the left panel, but with the region order reversed. The hippocampal stimulus neuron firing did not predict entorhinal context–neuron firing. b, Cross-correlations of all 40 entorhinal cortex stimulus and hippocampus context neuron pairs (MS–MC) before (pre, grey) and after (post, red) the experiment. Asymmetric correlations around −40 ms were absent before but emerged during the experiment and persisted afterwards (P < 0.01, cluster permutation test). c, Boxplots (quantile 1, median, quantile 3 and whiskers (points within ±1.5× the interquartile range) are shown) of mean cross-correlations (−10 to 100 ms; see dashed lines in b) between the same neuron pairs as in panel b (left). Correlations did not differ from zero before the experiment (exp.; P = 0.29, pre in grey; Wilcoxon signed-rank test), but exceeded zero during its first (exp., violet) and second (exp., blue) halves and after its termination (post, red; all P < 0.001). Correlations of all later periods exceeded pre-experiment levels (P(pre, exp1) = 3.258 × 10−2; P(pre, exp2) = 6.625 × 10−3; and P(pre, post) = 3.106 × 10−3). Firing rates did not differ significantly before and after the experiment but were significantly lower (P < 0.01) than during each experimental half (right). ****P < 0.0001, ***P < 0.001, **P < 0.01 and *P < 0.05. All tests are two-sided and uncorrected.
Next, we analysed whether asymmetric cross-correlations between entorhinal MS and hippocampal MC neurons arise from pairings of stimuli with contexts and persist post-experiment. We computed cross-correlations during time windows before (Fig. 4b, grey) and after the experiment (Fig. 4b, red; see Methods). Strong asymmetric cross-correlation peaks were absent before the experiment, but present after its termination, differing significantly at −87 to −19 ms (P < 0.01, firing of MS predicts that of MC). Sequential firing therefore appeared to be associated with the experiment and to persist after its termination. We then quantified mean cross-correlation peaks from −100 to −10 ms throughout recordings (Fig. 4c, left). Mean cross-correlations did not differ significantly from zero (two-sided Wilcoxon signed-rank test) before (P = 2.889 × 10−1, pre in grey), but did both during the first (P = 2.036 × 10−5, exp. in violet) and second (P = 1.106 × 10−5, exp. in blue) half of the experiment and after its termination (P = 1.834 × 10−5, post in red). Cross-correlations in all subsequent time periods significantly exceeded pre-experiment levels (P(pre, exp1) = 3.258 × 10−2, P(pre, exp2) = 6.625 × 10−3 and P(pre, post) = 3.106 × 10−3, two-sided Wilcoxon signed-rank test). By contrast, mean firing rates (Fig. 4c, right) did not differ before and after the experiment (P(pre, post) = 4.103 × 10−1), but were significantly higher in its first and second half than before (P(pre, exp1) = 1.424 × 10−4 and P(pre, exp2) = 7.183 × 10−3) and after (P(post, exp1) = 3.244 × 10−3 and P(post, exp2) = 7.782 × 10−3).
Pre-activation predicts reactivation
We tested whether pre-activation of context neurons by preferred contexts (that is, questions) modulates their activity during subsequent picture presentations and their response likelihood following stimulus neuron activation.
First, we assessed whether normalized activity (baseline normalization; see Methods) of MC neurons during question presentations predicts activity during subsequent picture presentations (Fig. 5b and baseline control in Fig. 5c). Participant means of question (r = 0.46, P = 2.07 × 10−5), but not baseline activity (r = 0.03, P = 7.79 × 10−1) significantly (P < 0.01, Pearson correlation) predicted subsequent responses of MC neurons to picture presentations (Fig. 5b,c). The same results were obtained for all MC neurons (Extended Data Fig. 10a,b). Similarly, question activity of MC neurons predicted subsequent picture activity significantly more strongly than baseline activity trial by trial (Extended Data Fig. 10c). Specifically, participant means of Pearson correlation effect sizes were significantly higher for correlations between question and picture activity (r̄ = 0.12, left in green) than between baseline and picture activity (r̄ = 0.03, right in grey; P = 6.13 × 10−3, two-sided Wilcoxon signed-rank test). Next, we determined whether cross-correlations between stimulus and context neurons were affected by whether a preferred context (question with the strongest context neuron response) or a preferred stimulus (picture with the strongest stimulus neuron response) was presented (permutation test; see Methods). Cross-correlations between entorhinal MS and hippocampal MC neurons were significantly stronger on short (−66 to 28 ms) timescales when a preferred versus non-preferred context of context neurons was displayed (Fig. 5d; P < 0.01, two-sided cluster permutations test), but not when a preferred versus non-preferred stimulus of stimulus neurons (Extended Data Fig. 10d,f) was depicted. A schematic model relating these findings to stimulus–context association storage and retrieval is depicted in Fig. 5a.
Fig. 5: Sequential activation after pre-activation could guide stimulus-driven pattern completion.
a, Schematic model of stimulus–context storage. Activation of entorhinal cortical (EC) MS (red) and hippocampal (H) MC neurons (green) during trial events. Specific context neurons in the hippocampus respond to particular questions (coloured circles) and are reactivated during picture presentations (top left). Picture presentations elicit responses in selected stimulus neurons (numbers), whose activity predicts context neuron firing after approximately 40 ms (Fig. 4), strengthening connections encoding stimulus–context associations. Question response strengths predict context neuron activity and excitability (yellow triangles) during subsequent picture presentations (panels b,d; Extended Data Fig. 10a–c; lower left). Pre-activation by preferred contexts guides stimulus-driven pattern completion in hippocampal context neurons (right; dashed yellow line; panels d–f). b, Scatter plot of mean z-values for the five contextual questions during question versus picture presentations, computed per MC neuron and patient (5 questions × 16 patients, green stars in panel b and grey stars in panel c). Pearson correlations are visualized by regression lines. Question-evoked responses predict picture responses (r = 0.46, P = 2.07 × 10−5). c, As in panel b, but baseline versus picture presentations showing no correlation (r = 0.03, P = 0.78). d, Shift-corrected cross-correlations of EC-MS and H-MC during picture presentations after a preferred (maximum response, green) versus a non-preferred context (grey). The red horizontal lines indicate significant differences (P < 0.01, two-sided cluster permutation test). Data are mean ± s.e.m. (solid lines and shaded areas, respectively) with the dashed line indicating zero. MS firing predicts MC firing more strongly in preferred contexts. n = 40 neurons. e, Context SVM-decoding accuracy of H-MC trained with question and decoded with picture activity time locked (0–100 ms) to EC-MS firing (6 sessions: P = 0.031; 40 neuron pairs: P = 5.794 × 10−4; two-sided Wilcoxon signed-rank test). f, Context SVM decoding of H-MC distinguishing whether preferred (red) versus non-preferred (blue) pictures of the corresponding EC-MS were presented (6 sessions: P = 0.0156; 40 neuron pairs: P = 0.0338; one-sided Wilcoxon signed-rank test). For the boxplots (e,f), quantile 1, median, quantile 3 and whiskers (points within ±1.5× the interquartile range) are shown; dashed lines indicate chance. *P < 0.05.
For further validation, we performed SVM-decoding analyses regarding the role of entorhinal stimulus and hippocampal context neuron interactions in stimulus-driven reinstatement of context. First, we computed session-wise context SVM-decoding accuracies of hippocampal MC neurons trained with question activity and decoded with subsequent picture activity (100–1,500 ms) time locked to entorhinal stimulus neurons firing (100-ms windows). Decoding accuracies slightly, but significantly, exceeded chance (0.21; Fig. 5e; 6 sessions: P = 0.03; 40 neuron pairs: P = 5.79 × 10−4; two-sided Wilcoxon signed-rank test). Second, we computed session-wise context SVM-decoding accuracies of hippocampal MC neurons for trials with preferred versus non-preferred pictures of the corresponding entorhinal stimulus neuron. Hippocampal MC neurons represented context more strongly when preferred (0.241) versus non-preferred (0.226) pictures of entorhinal stimulus neurons were presented (Fig. 5f; 6 sessions: P = 0.016; 40 neuron pairs: P = 0.034; one-sided Wilcoxon signed-rank test).
Discussion
Contexts constrain which memories are relevant for future decisions15. Single neurons in the MTL represent both semantic content and context, that is, aspects of an episode1, such as tasks11, episodes13 or rules12. It remains unclear, however, how content and context are combined to form or retrieve integrated item-in-context memories in humans. In a context-dependent stimulus-comparison task, pairs of pictures were presented following a question that specified how the pictures were to be compared (Fig. 1a). Sixteen of seventeen participants remembered picture contents in their respective question context, showing consistent and transitive choices (Extended Data Fig. 1) closely matching objective ground truths (Extended Data Fig. 2). Analysing 3,109 neurons recorded from 16 neurosurgical patients, we found that co-activation of mostly separate populations of 597 stimulus-modulated neurons and 200 context-modulated neurons combined question and picture representations across temporal gaps. During picture presentations, when both representations became task-relevant, context neurons encoded question context, and stimulus neurons encoded picture contents (Fig. 3).
Context and content representations persisted until the trial ended, when participants selected one picture according to context (Fig. 3f), and were enhanced during behaviourally correct trials (Extended Data Fig. 7). During and after, but not before the experiment, firing of entorhinal stimulus neurons predicted hippocampal context neuron activity (Fig. 4a–c), consistent with storage of stimulus–context associations via spike-timing-dependent plasticity. Increased excitability of context neurons after pre-activation by their preferred context suggested a gating mechanism of stimulus-driven pattern completion in the hippocampus (Fig. 5a,d) that enhanced context representations (Fig. 5f). This implies that stimulus and context neurons contribute to context-dependent stimulus processing via reinstatement of context (Figs. 3e and 5e,f). Their co-activation could support memory of stimuli within their context and refine content processing according to context. The generality of the representations of both populations (Fig. 3b) appears well suited to support flexible decision-making by broadening or constraining memories through reinstatement or co-activation. Conjunctive representations of stimulus–context, conversely, reflected limited pattern separation in humans and may specify item-in-context or their attributes.
Operationalization of context
Contexts include independent elements (encoded alongside contents without altering processing) and interactive elements that influence content processing14. For example, evaluating the price of a melon in a supermarket versus its taste at home illustrates interactive context effects. Our task operationalizes interactive contexts, as questions define comparison rules that shape stimulus processing16. Behaviourally, we successfully captured these effects as pictures were ranked differently across questions (median Kendall’s W = 0.232) yet consistently and transitively within questions, tracking objective ground truths (mean ρ = 0.760; Extended Data Fig. 2), which is not expected for fixed stimulus-value judgements. Neural data reflected both interactive contexts and the contents that they acted on. First, stimulus neurons represented picture contents rather than isolated features, as evidenced by consistent encoding across contexts (Fig. [3b](https://www.nature.com/articles/