Many have marveled at how similarly ChatGPT can write like humans. But today’s computer scientists are posing a claim that goes even further: The large language models that power artificial intelligence platforms like ChatGPT and Google Gemini may behave like humans too.
A new study by Brown researchers provides evidence that the mechanisms of reasoning between humans and LLMs are remarkably similar.
“There’s a deep existential question about, ‘Are humans just really big information processors?’” Associate Professor of Computer Science and Cognitive and Psychological Sciences Ellie Pavlick, an author on the study, explained.
“There…
Many have marveled at how similarly ChatGPT can write like humans. But today’s computer scientists are posing a claim that goes even further: The large language models that power artificial intelligence platforms like ChatGPT and Google Gemini may behave like humans too.
A new study by Brown researchers provides evidence that the mechanisms of reasoning between humans and LLMs are remarkably similar.
“There’s a deep existential question about, ‘Are humans just really big information processors?’” Associate Professor of Computer Science and Cognitive and Psychological Sciences Ellie Pavlick, an author on the study, explained.
“There is a behavioral match that you can’t ignore,” said co-author Roman Feiman, assistant professor of cognitive and psychological sciences and linguistics.
In one of the researchers’ experiments, an LLM or human subject had to evaluate evidence and figure out a predetermined but hidden “rule,” Feiman explained.
The researchers presented shapes of varying size and color to human participants, who were “instructed to learn the meaning of a unique adjective” by selecting which shapes they believed were described by adjective, according to the study.
For example, the rule could have been “something like, it’s good if it’s blue … or it could be it’s good if it’s blue or a triangle,” Feiman said. As participants matched the adjective and shape, they received feedback on whether their answers were correct or incorrect, he added.
“Based on that feedback, you can adjust your hypothesis about what you think it … might be (and) that helps you hone in on the rule,” Feiman explained.
The LLMs underwent a similar “binary classification task” where the model had to label whether the shape fit the “class described by the rule,” the study reads.
“It’s a test of how people adjust their hypotheses in light of evidence,” Feiman said. “Eventually,” both the LLM and the human subject will figure out the correct rule — or at least something “very close.”
Although the LLM and the human may both arrive at a similar “logically structured hypothesis,” there is a “more interesting version of the question,” Feiman added: “How similar is their trajectory to get there?”
Researchers found that LLMs are “generally, on average, more human-like (than other AI models) even whether they are right or wrong,” Feiman said.
In an email to The Herald, Michael Frank, professor of psychology and brain science and director of the Carney Center for Computational Brain Science, commended how the paper asked these questions. Frank was not involved with the study.
“It can be insightful to ask not just whether the AI can achieve similar abilities to humans, but to also ask whether (...) the specific behaviours align more with humans than other competing computational models,” he wrote.
In an email to The Herald, Professor and Chair of Cognitive and Psychological Sciences David Badre, who was not involved in the study, wrote that he found the results of the paper exciting “because it suggests these networks are not simply simulating the behaviors of human decision makers, but relying on similar underlying computations.”
According to Feiman, the cognitive science field has held claims since the ’80s that no matter how close AI gets to human cognition, there is one feat the computers’ neural networks will not achieve: logical reasoning.
“I think I was sold on those arguments a little bit too much,” Feiman said. “That’s what made this study so interesting for me, because it really changed my mind.”
A challenge in experiments dealing with LLM subjects is that it is difficult to ensure that the LLM has not encountered something similar in its training data, Feiman explained.
** Get The Herald delivered to your inbox daily.
“LLMs are trained to predict human language, and so (they) are implicitly incentivized to mimic human cognitive processes,” said postdoctoral researcher Jake Russin, who researches under Pavlick and Frank and was not involved in the study. “This makes it difficult to know, in cases where behavior is human-like, whether this is because of shared fundamental principles or merely because the model has learned to imitate humans.”
Feiman noted that though LLMs may not always be the most accurate models, they are the most human-like compared to other computational models.
Like humans, LLMs display biases like belief bias — the idea that people, even when presented with adverse evidence, will stand by their preexisting beliefs. Understanding the mechanisms of these biases in AI can help researchers understand their mechanisms in humans, Feiman said.
Research like this draws upon larger, more philosophical debates in the field of cognitive science.
According to Pavlick, this research begs the question: “What makes us human, and to what extent can it be replicated in machines?”