Introduction
Prevailing theories on the functional organization of primate neocortex are dominated by large-scale areal parcellations, ranging from approximately 200 areas in humans to around 130 in macaques1,2,3. Early sensory-motor areas of primate cortex, however, exhibit a more fine-grained organization, with clusters of neur…
Introduction
Prevailing theories on the functional organization of primate neocortex are dominated by large-scale areal parcellations, ranging from approximately 200 areas in humans to around 130 in macaques1,2,3. Early sensory-motor areas of primate cortex, however, exhibit a more fine-grained organization, with clusters of neurons sharing common functional properties and precise patterns of connectivity within and between neighboring areas4,5. These groups of neighboring neurons with shared response properties can be referred to as mesoscale functional units (MFUs), which represent functional sub-compartments within larger brain regions. While traditional cortical columns are one example of MFUs, MFUs can vary significantly in size, distribution, and frequency of occurrence within a given area, distinguishing them from traditional cortical columns.
While the concept of MFUs is not new, their existence and characteristics in higher-order extrastriate visual cortex, particularly within specialized category-selective areas, remain poorly understood. These category-selective regions are known to encode complex information beyond their preferred category, with voxels in face-selective areas, such as the fusiform face area (FFA)6, also encoding significant information about non-face categories7,8,9,10,11,12,13,14,15. Notably, substantial functional heterogeneity has been observed within these areas, with individual voxels exhibiting different response properties16,17. However, studies showing that this functional heterogeneity is organized in a spatially coherent manner at small scales, such as that would be expected of MFUs, remain contested and limited18,19.
Previous work by Gallant and colleagues revealed that voxels with similar response profiles are grouped within FFA10 and scene-selective regions17, but the median cluster size was approximately 0.5 cm3 (voxels were 17.6 mm3), likely reflecting a macro-level rather than mesoscale functional organization. The work of Tanaka and colleagues, which revealed a columnar-like organization in anterior inferotemporal cortex for simple shape features20,21,22, is a notable exception. However, spatial clustering of neurons at mesoscale level in category-selective cortex of primates has yet to be shown.
In this study, we used sub-millimeter whole-brain fMRI (0.6 mm isotropic, or 0.22 mm3 voxels)23 to investigate the existence and characteristics of MFUs in high-level category-selective areas. We demonstrate that face-, body-, and object-selective areas can be reliably subdivided into anatomically clustered MFUs. Single-cell recordings in a body-selective area confirm a similar functional clustering as observed with fMRI. Additionally, resting-state fMRI-based functional connectivity reveals distinct interhemispheric connectivity patterns for MFUs of the same type. Moreover, our analyses revealed remarkable similarities in MFU properties across subjects, sharp boundaries between MFUs, and similar spatial fall-off patterns of response similarities as previously observed in single-unit recordings. These findings indicate the presence of a mesoscale functional organization in high-level category-selective areas and suggest that these MFUs form large-distance mesoscale functional networks.
Results
Functional and anatomical clustering of voxels in middle lateral face area ML
In a block-design fMRI experiment, monkeys viewed 200 images belonging to 10 different visual categories including human and monkey faces and bodies, two types of objects, animals, birds, fruits and sculptures (see Fig. 1, “Methods”, and ref. 24). Whole-brain contrast-agent enhanced25 fMRI data (0.6 mm isotropic voxels) were acquired with phased-array receive coils embedded in the headset of the animals23,26,27. We first identified face area ML, using the conjunction of three different contrasts (monkey faces versus monkey objects, monkey faces versus fruits, and monkey faces versus monkey bodies). Only voxels reaching a threshold of p < 0.05 for each contrast on both even and odd scan days were selected—a conservative approach yielding voxels exclusively belonging to ML. On the fMRI tuning curves of these 0.216 mm3 voxels, we performed a hierarchical cluster analysis. This revealed 3 different clusters within ML, indicating functional grouping (Fig. 2A–C; see Supplementary Fig. S7 for visualization of these clusters in low-dimensional space). These clusters exhibited a significant decrease in the Trace Cov W index (Fig. 2E), and the largest change in cumulative distribution function (CDF) area from consensus clustering (Supplementary Fig. S3A). Although all 3 clusters were face-selective, exactly as predicted based on previous research12,28,29 and our selection criteria, each cluster showed a different functional profile (Fig. 2B, C). Voxels of the first cluster were most specific for face stimuli compared to the two other clusters. The second and third clusters were also activated by animals and birds, and the third cluster was also characterized by its stronger sensitivity for mammals and weaker responses for non-animate objects.
Fig. 1: Stimuli and experimental design. A block design was conducted and each block lasted 30 s. Each category contained 20 different images, each image was shown for 750 ms and was repeated twice in one block. The color of the blocks matches the color of the outline of the example stimuli indicated in the upper panel.
Fig. 2: ML contains three segregated mesoscale functionally units with different categorical selectivity profiles and inter-hemispheric functional connections in subject M1. A Unsorted normalized activation profiles in each 0.216 mm3 voxel. B Normalized activation profiles in each voxel sorted by the hierarchical cluster analysis. Left panel: dendrogram of the hierarchical cluster tree. C Mean normalized activation profiles in each MFU of ML. Each stimulus class is indicated by an example image. D Functionally clustered voxels (three clusters in B) are back-projected to anatomical images in coronal (upper panel) and sagittal (lower panel) planes. Different functionally clustered voxels are also anatomically clustered as indicated by the same color-code as (B). The insets in the right upper corner correspond to the black rectangles on the slices. E Trace Cov W index indicated that the optimal number of clusters = 3 (red dot). Blue curve: Trace Cov W index calculated from different number of clusters. Orange curve: first difference of the Trace Cov W indices calculated from the real data. Gray dashed line and shading: mean and 99% confidence interval of the first differences of the Trace Cov W indices calculated from the 10,000 permutations. F Violin plot of the per run FC analysis. Each dot represents FC calculated from each single run. Box plots (grey box in the center of the violin) show the interquartile range (IQR; 25th–75th percentiles), with a white dot indicating the median within each box. Whiskers extend to the most extreme data points within 1.5(\times)IQR of the box edges, or to the minimum/maximum values. FC between MFUs of the same type (i.e., belonging to the same functional cluster) is significantly higher than FC between MFUs of different types across the two hemispheres (n = 28, p = 0.0014, uncorrected, two-sided paired t-test). G A permutation test shows that the veridical “same vs. different type MFUs FC strength” was significantly stronger compared to virtual MFUs with randomly assigned voxels (p = 10−4, uncorrected). The median value for veridical “same vs. different type MFUs FC strength” is indicated by the red vertical line and the median value for the same comparison but with randomly shuffled voxels across MFUs (10,000 permutations) is indicated by black vertical line. Source data are provided on GitLab (https://gitlab.com/lzgitlab/share/mfus_of_face_body).
Next, we mapped the voxels from these three functionally defined clusters onto the brain. Interestingly, these voxels were not randomly interspersed in a salt and pepper-like pattern but grouped in spatially segregated units (Fig. 2D). Moreover, voxels belonging to the same functional cluster were retrieved in approximately similar spatial locations along the medio-lateral and posterior-anterior axes of ML in both hemispheres (Fig. 2D) and monkeys (Fig. 2 (M1) and Supplementary Fig. S1 (M2)). Thus, functionally clustered voxels are orderly organized in MFUs within face area ML, indicating spatial clustering.
To quantify the similarity in response patterns between clusters across animals, we calculated Pearson correlation coefficients on the normalized response patterns of the three clusters between M1 and M2. After subtracting the mean response pattern within each monkey, matching clusters exhibited high correlation coefficients ranging from 0.84 to 0.95 in ML (Supplementary Fig. S4A, left panel), while non-matching clusters exhibited low correlations ranging from −0.79 to −0.04.
To statistically validate these observations, we performed a permutation test (10,000 iterations) where voxels from M2 were randomly assigned to one of the 3 clusters while preserving the original cluster sizes. For each iteration, we computed the correlation matrix between the normalized mean response patterns of the 3 clusters across animals, and compared the average diagonal (matching clusters) correlations to off-diagonal (non-matching clusters) correlations. The result demonstrated that the real data exhibited significantly higher matching-versus-nonmatching cluster correlation than the permuted data (p = 0.00039; Supplementary Fig. S4A, right panel). Hence, the 3 types of MFUs in ML are reliably conserved across animals, beyond that would be expected by chance.
Mesoscale functional units of ML form segregated mesoscale functional networks
Previous research has been shown that face areas are interconnected across hemispheres30,31. Here, we aim to explore whether there is a more detailed functional connectivity pattern, not at the level of areas but at MFU level. Specifically, we investigated whether different-type MFUs within ML exhibit distinct connectivity profiles. To this end, we conducted a functional connectivity (FC) analysis on independent high-resolution resting-state fMRI data from the same subjects (0.6 mm isotropic voxels) using the functionally defined MFUs as starting point. We tested whether MFUs belonging to the same cluster in both hemispheres (same-type MFUs) are preferentially connected with each other. We found that across runs, FC strength between same-type MFUs in the left and right hemispheres is significantly stronger than between different-type MFUs (p-values (\le) 0.002) (Fig. 2F (M1), and Supplementary Fig. S1 (M2)). To control for any biases introduced by different sizes of the MFUs, we conducted a permutation test in which we randomly shuffled the ML voxels and compared the veridical FC strength of the “same vs. different type MFUs” against virtual MFUs consisting of permutated voxels. Again, we found that there was significantly higher FC across same-type MFUs in the two hemispheres of both subjects compared to MFUs with permutated voxels (p-values < 10−4) (Fig. 2G (M1), and Supplementary Fig. S1 (M2)).
Middle body-selective area MSB also contains mesoscale functional units, which form interhemispheric mesoscale functional networks
To generalize our findings, we performed the same analysis on another stringently defined category-selective area but belonging to the body-processing network, i.e., body selective area MSB (Fig. 3). MSB could also be subdivided into three MFUs, with functionally clustered voxels based on different functional responses for the different object categories.
Fig. 3: Three types of mesoscale functional units in body area MSB in M2. Body area MSB in M2 contains three different clusters of voxels with different categorical selectivity profiles (functional clustering) (A, B), which are also anatomically clustered (C). Hence, MSB also consists of mesoscale functional units. D Trace Cov W index indicated that the optimal number of clusters = 3 (red dot). Gray shading indicates the 99% confidence interval of the first differences of the Trace Cov W indices from 10,000 permutations (cf. Fig. 2E). E, F MFUs of the same type across the two hemispheres (belonging to the same functional cluster) show stronger functional connections compared to different type MFUs (E, two-sided paired t-test, p = 0.005, uncorrected, n= 37 runs; F, permutation test, p = 0.017, uncorrected, 10,000 permutations). Box plots (grey box in the center of the violin) in (E) show the interquartile range (IQR; 25th–75th percentiles), with a white dot indicating the median within each box. Whiskers extend to the most extreme data points within 1.5(\times)IQR of the box edges, or to the minimum/maximum values. Data are from subject M2. Results are computed and presented as in Fig. 2. Source data are provided on GitLab (https://gitlab.com/lzgitlab/share/mfus_of_face_body).
MFU1 and MFU2 in M2 show the strongest responses to monkey and human bodies, as well as mammals and birds. While faces do not elicit responses in MFU1 of M2’s MSB, face-evoked activity gradually increases in MFU2 and MFU3. The latter also responds less to human bodies compared to MFU1 and MFU2 (Fig. 3). MFU1 in M1’s MSB (Supplementary Fig. S2) is particularly responsive to monkey bodies, with minimal or no response to human bodies, animals, or birds. In contrast, MFU2 also responds to birds and to a lesser extent to mammals, a pattern that is reversed in MFU3. Interestingly, MFU3 also responds to faces but does not react to either monkey or human objects, unlike MFU1.
To quantify the similarity in response patterns between the two animals in the three MSB clusters, we performed the same correlation analysis as in ML (Supplementary Fig. S4B). The responses in matching MFUs show higher correlations (ranging from 0.48 to 0.73; diagonal) across animals than those in non-matching ones (ranging from −0.81 to 0.25; off-diagonal) (left panel). This finding was further validated using a permutation test (right panel), as described above for ML. The actual, non-permuted data revealed a significantly higher (p = 0.028) matching-versus-nonmatching cluster correlation than the permuted data (Supplementary Fig. S4B, right panel).
Finally, these functionally clustered voxels appear to be anatomically clustered within body area MSB, hence, they constitute MFUs within area MSB. Moreover, same-type MFUs of MSB also show higher interhemispheric functional connectivity, compared to different-type MFUs. These results were again confirmed in both monkeys (Fig. 3E, F (M2), and Supplementary Fig. S2E, F (M1)).
Correspondence between fMRI and single-unit defined mesoscale functional units in MSB
The discovery of MFUs in category-selective areas is based on hemodynamic signals. Therefore, to investigate the neuronal basis of this finding, we reanalyzed single-unit recordings throughout the entire extent of fMRI-defined MSB of another animal (M3). MSB was first identified by low-resolution (1.25 mm instead of 0.6 mm isotropic voxels) fMRI maps and fMRI-guided single-unit recordings were performed using half of the stimuli that were also used in the high-resolution fMRI experiment (for details see “Methods”, and refs. 15,24). Hierarchical cluster analysis on the population of single-unit responses showed that body selective cells (n = 98), selected from the entire visually driven population using identical criteria as the fMRI voxels (i.e., the conjunction of 3 body-selective contrasts: monkey bodies versus monkey objects, monkey bodies versus fruits, and monkey bodies versus monkey faces), could be clustered into three subdivisions (Fig. 4A). These clusters exhibited a significant decrease in the Trace Cov W index (Fig. 4C), and the largest change in CDF area from consensus clustering, with or without averaging responses within each category (Supplementary Fig. S3D).
Fig. 4: Comparison between MFUs based on fMRI and functional clusters based on single-cell recordings in body-selective area MSB. A three functional clusters of category-selective cells recorded throughout the entire extent of MSB (M3), obtained using single-unit recordings (98 neurons). Right, Spiking activity matrix where each row represents the normalized responses of a neuron across 100 stimuli (10 per category) in MSB. Left, dendrogram of the hierarchical cluster tree, after clustering. B hierarchical clustering conducted on averaged fMRI and single-cell cluster profiles. Right, averaged fMRI and single-cell cluster profiles. Each row of the average single-cell cluster profile represents the mean normalized response of a functional cluster in (A) across the 10 stimulus categories. Each stimulus class is indicated by an example image. Left, dendrogram of the hierarchical cluster tree, after clustering. C Trace Cov W index (calculated from averaged single-cell responses at category level) indicated that the optimal number of clusters = 3 (red dot). Gray shading indicates the 99% confidence interval of the first differences of the Trace Cov W indices from 10,000 permutations (cf. Fig. 2E). D Permutation test results. Red line represents the within-versus-between-cluster correlation coefficient of activation profiles between fMRI MFUs and single-cell functional clusters. Histogram represents the distribution of the same correlation coefficients calculated from the 10,000 permutations. The results demonstrate a significantly higher (p = 0.0002, uncorrected) matching-versus-nonmatching cluster correlation of activation patterns between fMRI and single neuron clusters when neurons with similar response profiles are grouped together, as opposed to being arranged in a salt-and-pepper configuration. Source data are provided on GitLab (https://gitlab.com/lzgitlab/share/mfus_of_face_body).
The category-selective tuning profiles of these neuronal clusters were not simply face-, body-, or animal-selective. Instead, the profiles are surprisingly similar to those observed in fMRI-defined MFUs in different animals. To quantify their correspondence, we calculated nonparametric Spearman correlation coefficients between the average across-voxel activation profiles in three fMRI-defined MFUs (using data of both subjects) and the average across-neuron spiking activity profiles of the three functionally defined clusters obtained in the electrophysiology experiment. A hierarchical cluster analysis showed that, instead of a separation in distinct fMRI and single-cell clusters, each fMRI cluster corresponded specifically to a single-cell cluster (Fig. 4B). This suggests that the fMRI-defined category-selective activity patterns, defining the three MFUs within MSB, are also reflected by the single-unit responses recorded in the same area. Note, however, that the single-unit data could not reveal evidence for anatomical clustering, unlike the fMRI data.
One might argue that the clusters observed in fMRI data are artificially induced by spatial smoothing, even if single neurons are organized in a salt-and-pepper pattern. To rule out this possibility, we conducted a permutation test by randomly assigning each single neuron to one of the 3 clusters 10,000 times, while preserving the original cluster sizes identified through hierarchical clustering analysis. For each permutation, we computed the mean activation pattern of each cluster and correlated them with those of the fMRI clusters. We then compared the matching-versus-nonmatching cluster correlation coefficient from the permuted data (represented by the histogram) to that obtained from our original clustering analysis (indicated by the dashed red vertical line in Fig. 4D). The observed correlation was significantly higher (p = 0.0002) for neurons grouped by similar response profiles as compared to when they were arranged in a salt-and-pepper configuration. This finding strongly supports the existence of spatially clustered neurons.
Mesoscale functional units in object-responsive cortex
Finally, we addressed the question whether clustering in MFUs is restricted to face and body areas. Specifically, we performed the same analysis on the object-responsive region adjacent to ML and MSB, and also revealed the presence of 3 MFUs (Supplementary Fig. S5). These MFUs predominantly preferred inanimate stimulus categories, with a notable emphasis on human and monkey objects. In both animals, MFU3 was slightly activated by human bodies, and sculptures, unlike monkey faces and bodies. The main distinction between MFUs 1 and 2 was that MFU1 showed a slightly stronger preference for human faces and bodies compared to MFU2. The correlation in response profiles between the three clusters demonstrated high inter-subject reproducibility (see Supplementary Fig. S4C).
Additionally, as observed in the face (ML) and body-selective patches (MSB), interhemispheric functional connectivity was higher for same-type MFUs compared to different-type MFUs within the object-selective patch in the inferotemporal cortex of both animals (Supplementary Fig. S5E, F, K, L).
These findings suggest that the fine-grained organization, whereby large areas are subdivided into MFUs which form mesoscale functional networks, might be a general feature of the IT cortex.
Differences among these mesoscale functional units cannot be explained by gradual variations in eccentricity bias or face/body-selectivity
We observed stronger interhemispheric functional connectivity between MFUs of the same-type compared to those of different-types within ML, MSB, and the object-selective patches. One might argue that this pattern may be linked to more robust functional connections between matching eccentricity representations compared to non-matching ones, considering that these patches are retinotopically organized32.
To determine whether the MFUs can be distinguished based on eccentricity differences, we conducted a separate high-resolution (0.6 mm isotropic voxels) retinotopic mapping experiment using the same subjects27. We then computed correlation between distance matrices derived from eccentricities and those from full object response profiles to assess how much variance among the 3 clusters could be attributed to eccentricity. To further evaluate the reliability of these distance measurements, we split data from each experiment into independent datasets and correlated distances calculated using the same response properties across these splits. The results showed that while test-retest correlations were high for both types of distance measures (eccentricity and full object response profiles), the cross-correlations between these two measurements were low (see Fig. 5A). This indicates that, although eccentricity might account for some variance among the 3 clusters, its contribution is minimal. Indeed, eccentricity explained only 4.03% and 8.97% of the variance in ML, and 1.21% and 8.46% in MSB in the two subjects, respectively. Therefore, eccentricity does not drive the clustering or the observed interhemispheric functional connectivity between same-type MFUs.
Fig. 5: MFU organization is independent of eccentricity and face/body selectivity, and exhibits sharp boundaries. A, B Correlation analyses demonstrating the independence of MFU organization from eccentricity and face/body selectivity. Test-retest reliability (lines) for distance matrices derived from eccentricities (yellow in A) and face/body selectivity indices (yellow in B), alongside reliability for full object response profiles (orange). Cross-correlation between distance measurements based on categorical responses and those based on eccentricity (blue bars in A) or face/body selectivity (blue bars in B) is shown. Data are presented for both monkeys. C, D Sharp transitions at MFU boundaries. Comparison of Euclidean distances between object response profiles of neighboring voxels along the borders of MFUs in subject 1 (M1, C) and subject 2 (M2, D). Distances were calculated for voxel pairs within the same MFU (adjacent voxels within the same MFU along the boundary) and across different MFUs (adjacent voxels across a boundary). Box plots (grey box in the center of the violin) show the interquartile range (IQR; 25th–75th percentiles), with a white dot indicating the median within each box. Whiskers extend to the most extreme data points within 1.5 × IQR of the box edges, or to the minimum/maximum values. For each subject, two-sided Wilcoxon rank-sum tests revealed significant differences: M1—ML (across/within pairs, n = 201/452, p= 9.9 (\times) 10−28); MSB (n = 147/362, p = 5.8 (\times) 10−28). M2—ML (n = 280/612, p = 1.3 (\times) 10−18); MSB (n = 123/253, p = 6.5 (\times) 10−8). All p-values are uncorrected. Source data are provided on GitLab (https://gitlab.com/lzgitlab/share/mfus_of_face_body).
We performed a similar analysis to assess the contribution of face/body-selective responses to differences among the three MFU clusters. Specifically, we computed correlations between distance matrices derived from face (ML) and body selectivity (MSB) indices—calculated as t-values from “faces versus bodies” contrast or vice versa—and those derived from full object response profiles. Again, test-retest correlations were high for both types of distance measures, but cross-correlation between them were low (see Fig. 5B). Face/body selectivity explains only 0.87% and 1.47% of the variance in ML, and 1.36% and 18.28% in MSB in the two subjects, respectively.
Smooth or sharp functional boundaries between adjacent mesoscale functional units
To determine whether object response profiles change gradually across neighboring MFUs or exhibit sharp boundaries, we compared the Euclidean distance of object response profiles between neighboring voxels along MFU borders. Specifically, we calculated functional distances: (1) between adjacent voxels at the border of the MFUs, but belonging to the same MFU (“within” condition), and (2) between adjacent voxels at the border of the MFUs but belonging to different MFUs (“across” condition), and compared them using a Wilcoxon rank sum test. In each region and subject (Fig. 5C, D and Supplementary Fig. S6), we observed significantly higher Euclidean distances between the response profiles of two neighboring voxels across MFU borders compared to those within the same MFU (all ps < 10−7). Hence, there is a sharp functional boundary between adjacent MFUs.
Comparing spatial falloff patterns of response similarities between fMRI and single-unit recordings
Previous single-unit recordings in monkey IT cortex have shown that nearby cells exhibit similar stimulus and category selectivity, with response similarities (correlations of response profiles) decreasing as the distance between cells increases13,[33](https://www.nature.com/articles/s41467-025-63962-6#ref-CR33 “Lee, H. et al. Topographic deep artificial neural networks reproduce the hallmarks of the primate inferior temporal cortex face processing network. Preprint at https://www.biorxiv.org/content/10.1101/2020.07.09.185116v1
(2020).“). This spatial profile of response similarity can be approximated by a simple rational function 1/(1 + x)[33](https://www.nature.com/articles/s41467-025-63962-6#ref-CR33 “Lee, H. et al. Topographic deep artificial neural networks reproduce the hallmarks of the primate inferior temporal cortex face processing network. Preprint at https://www.biorxiv.org/content/10.1101/2020.07.09.185116v1
(2020).“). We assessed whether our high-resolution fMRI data demonstrate a comparable spatial falloff pattern in IT. We used 1- Euclidean distance as an index of response similarity and pooled voxels from ML, MSB, and object-responsive regions to match the broader IT sampling of previous single-unit studies13,[33](https://www.nature.com/articles/s41467-025-63962-6#ref-CR33 “Lee, H. et al. Topographic deep artificial neural networks reproduce the hallmarks of the primate inferior temporal cortex face processing network. Preprint at https://www.biorxiv.org/content/10.1101/2020.07.09.185116v1
(2020).“). Center-to-center geometrical distance was used to measure cortical distances between voxel pairs. The mean pairwise similarity exhibited a similar falloff pattern (black line in Fig. 6) as observed in previous sing-unit data[33](https://www.nature.com/articles/s41467-025-63962-6#ref-CR33 “Lee, H. et al. Topographic deep artificial neural networks reproduce the hallmarks of the primate inferior temporal cortex face processing network. Preprint at https://www.biorxiv.org/content/10.1101/2020.07.09.185116v1
(2020).“). Importantly, fitting a rational function [a/(1 + bx) + c] with three free parameters to the raw data points yielded a slope parameter b of 0.9532 (95% CI: 0.88–1.03), which is not significantly different from the slope fitted to the single-unit recording data (slope = 1). A likelihood ratio F-test confirmed that this three-parameter model did not provide a significantly better fit than an alternative model with b fixed to 1 (F1,223501 = 1.38, p = 0.24). Thus, the change of response properties along the cortical surface as a function of distance, measured with high-resolution fMRI, corresponds remarkably well with data obtained from dense electrophysiological recordings in IT cortex.
Fig. 6: fMRI response similarity spatial profile mirrors single-unit recordings in IT cortex. Mean pairwise voxel similarity plotted against cortical distance for pooled fMRI data from ML, MSB, and object-responsive regions (data of M1 and M2 combined). The black line shows the observed decay pattern, with grey shading indicating the standard deviation at each pair-wise distance. The green line represents the best-fit rational function with a slope parameter (b = 0.9532) not significantly different from that obtained from the single-unit recordings. Source data are provided on GitLab (https://gitlab.com/lzgitlab/share/mfus_of_face_body).
Discussion
Compared to early visual cortex, the mesoscale functional organization of higher-level visual cortex, especially inferotemporal cortex, is still poorly understood. This is mainly due to a lack of high-spatial resolution tools with a sufficiently large field-of-view covering difficult-to-reach cortex. Exquisite optical imaging and electrophysiological recordings revealed a columnar-like organization of accessible parts of anterior inferotemporal cortex for simple shape features20,21,22. Our sub-millimeter fMRI approach, however, mitigated accessibility limitations and showed orderly organized and anatomically segregated functional units within fMRI-defined category-selective areas of primate inferotemporal cortex hidden in a sulcus. The combination of functionally- and anatomically segregated clusters in face-selective area ML and body-selective MSB shows that also specialized category-selective areas in inferotemporal cortex can be subdivided into MFUs, resembling the functional architecture found in early visual areas such as V1, V234,35,36,37, MT38,39,40, and V434,41,42,43.
Moreover, an independent high-resolution resting-state fMRI dataset revealed dissociable functional connections among MFUs of the same type (i.e., those belonging to the same functional cluster) across hemispheres. This finding greatly complements and refines previous results from studies showing that face areas across hemispheres are interconnected30,31. Furthermore, our analyses demonstrated that the stronger functional connectivity between same-type MFUs compared to different-type MFUs cannot be attributed to differences in eccentricity representations (Fig. 5). Although it remains to be investigated whether these functional connectivity patterns also correspond to differences in anatomical connections, our results indicate that MFUs of the same type form mesoscale functional networks spanning large distances (i.e., in this case even across hemispheres, possibly spanning multiple synapses). The functional connectivity results also resemble those observed in early visual areas where MFUs of the same type, located in different areas, are interconnected with each other. For example, color-biased blobs in V1 are connected with color-biased thin stripes in area V2, and vice versa36,44,45.
Due to the challenges in systematically parametrizing stimuli that activate different category-selective regions in the inferotemporal cortex, we are limited to making qualitative comparisons of the functions of different MFUs. In ML, MFU2 and MFU3 exhibit more complex tuning curves compared to MFU1. Voxels in both MFU2 and MFU3 display a progressively higher sensitivity to animals and birds, which include bodies as well as heads and face-like features. Furthermore, MFU3 shows enhanced responses to monkey bodies and reduced responses to inanimate objects. Given these increased responses to non-face stimuli in MFU2 and MFU3, it is plausible to speculate that these MFUs might play a role in contextualizing lower-level face information. For instance, they could integrate facial information with that from other body parts, thereby providing a more comprehensive representation of the entire body.
Strikingly, single-cell data recorded from the same MSB area revealed three functional clusters with an average population response pattern that was surprisingly similar to those obtained using the high-resolution fMRI data. Notably, the single-unit data were guided by low-resolution fMRI and were acquired with single-contact electrodes across mult