The analysis schema for study 1. Credit: Chen et al. (Nature Human Behaviour, 2025).
Neuroscientists have been trying to understand how the brain processes visual information for over a century. The development of computational models inspired by the brain’s layered organization, also known as deep neural networks (DNNs), have recently opened new exciting possibilities for research in this area.
By comparing how DNNs and the human brain process information, researchers at Peking University, Beijing Normal University and other institutes in China have s…
The analysis schema for study 1. Credit: Chen et al. (Nature Human Behaviour, 2025).
Neuroscientists have been trying to understand how the brain processes visual information for over a century. The development of computational models inspired by the brain’s layered organization, also known as deep neural networks (DNNs), have recently opened new exciting possibilities for research in this area.
By comparing how DNNs and the human brain process information, researchers at Peking University, Beijing Normal University and other institutes in China have shed new light on the underpinnings of visual processing. Their paper, published in Nature Human Behavior, suggests that language actively shapes how both the brain and multi-modal DNNs process visual information.
Comparing AI and human vision
Over the past decade or so, computer scientists have developed new models that can process both images and texts. These computational techniques include contrastive language–image pretraining (CLIP), a model that learns to link pictures with associated text descriptions, ResNet, a computer vision model trained on labeled image data, and MoCo, a computer vision model trained on non-labeled data.
As part of their study, the researchers tested these three models on visual information processing tasks. They then compared these model’s predictions with those of human participants on the same tasks. The team also tested 33 patients who had suffered a stroke, which had damaged connections between brain regions responsible for visual processing and language.
Study 2 analysis workflow linking WM integrity and model–brain correspondence in patients with chronic stroke. Credit: Chen et al. (Nature Human Behaviour, 2025).
"Recent research has shown better alignment of vision–language DNN models, such as CLIP, with the activity of the human occipitotemporal cortex (VOTC) than earlier vision models, supporting the idea that language modulates human visual perception," wrote Haoyang Chen, Bo Liu and their colleagues in their paper.
"However, interpreting the results from such comparisons is inherently limited owing to the ‘black box’ nature of DNNs. We combine model–brain fitness analyses with human brain lesion data to examine how disrupting the communication pathway between the visual and language systems causally affects the ability of vision–language DNNs to explain the activity of the VOTC to address this."
The researchers tested the models across four publicly available datasets. Interestingly, they observed that the CLIP model better reflected the activity of the VOTC, a brain region known to play a role in the recognition of objects and categories, than the other models.
"Across four diverse datasets, CLIP consistently captured unique variance in VOTC neural representations, relative to both label-supervised (ResNet) and unsupervised (MoCo) models," wrote the authors. "This advantage tended to be left-lateralized at the group level, aligning with the human language network."
When they analyzed the brains of patients who had suffered strokes, Chen, Liu and their colleagues found that damage to connections between language and vision processing regions decreased the similarity between their brain activity and the CLIP model. In contrast, it appeared to increase the similarity to MoCo models, which only rely on the processing of visual stimuli to complete tasks.
"Analyses of 33 patients who experienced a stroke revealed that reduced white matter integrity between the VOTC and the language region in the left angular gyrus was correlated with decreased CLIP–brain correspondence and increased MoCo–brain correspondence, indicating a dynamic influence of language processing on the activity of the VOTC," wrote Chen, Liu and their colleagues.
"These findings support the integration of language modulation in neurocognitive models of human vision, reinforcing concepts from vision–language DNN models."
Informing neuroscience and AI research
Overall, the results gathered by this research team suggest that language plays a central role in how the human brain processes visual information. This observation could pave the way for further neuroscience studies, while also potentially inspiring the development of new multi-modal AI models.
In the future, the recent work by Chen, Liu and their colleagues could inspire other research groups to conduct studies that examine both human brain observations and AI model predictions. Such works could help to gain an even deeper understanding of the intricate processes via which humans tackle different everyday tasks, as well as disruptions in these processes that can arise from specific medical conditions.
"The sensitivity of model–brain similarity to specific brain lesions demonstrates that leveraging the manipulation of the human brain is a promising framework for evaluating and developing brain-like computer models," wrote the authors.
Written for you by our author Ingrid Fadelli, edited by Gaby Clark, and fact-checked and reviewed by Robert Egan—this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive. If this reporting matters to you, please consider a donation (especially monthly). You’ll get an ad-free account as a thank-you.
More information: Haoyang Chen et al, Combined evidence from artificial neural networks and human brain-lesion models reveals that language modulates vision in human perception, Nature Human Behaviour (2025). DOI: 10.1038/s41562-025-02357-5.
© 2026 Science X Network
Citation: Language shapes visual processing in both human brains and AI models, study finds (2026, January 7) retrieved 7 January 2026 from https://phys.org/news/2026-01-language-visual-human-brains-ai.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.