One of my regular hobbyhorses is that computer science/machine learning as a field failed to learn the correct lessons from psychology/cognitive science. The relationship between the two fields is fraught. In the late 1950s it was much less so; the early AI researchers felt that they had much to gain from talking to cognitive scientists, and the very premise of cognitive science is that we can better understand the brain by thinking of it as an information processing system like a computer. The AI researchers went to the cognitive scientists and asked, in approximately so many words, how the brain worked, so that they could implement that in computers. The cognitive scientists gave a prompt and confident answer, based on their most sophisticated models of cognition, which the AI researâŚ
One of my regular hobbyhorses is that computer science/machine learning as a field failed to learn the correct lessons from psychology/cognitive science. The relationship between the two fields is fraught. In the late 1950s it was much less so; the early AI researchers felt that they had much to gain from talking to cognitive scientists, and the very premise of cognitive science is that we can better understand the brain by thinking of it as an information processing system like a computer. The AI researchers went to the cognitive scientists and asked, in approximately so many words, how the brain worked, so that they could implement that in computers. The cognitive scientists gave a prompt and confident answer, based on their most sophisticated models of cognition, which the AI researchers promptly implemented. That answer was quite comprehensively wrong. It was so wrong that it led to something of a decade-long deep freeze where interaction between the disciplines of machine learning and psychology was minimal and, at best, deeply fraught.
From my perspective the relationship was soured by an inability to understandâfrom both sidesâwhat Psychology was really good at. The model-building efforts of cognitive scientists are interesting, and have been useful in moving forward our understanding of the brain. But those models are, universally and generally quite profoundly, wrong. They do not describe what is actually happening in the brain in anything like as accurate or coherent a fashion as would be necessary to use that understanding as the spec to build a computer system. Models like Biedermanâs Geons or Simonâs General Problem Solver are tremendously interesting ways to think about what the brain might be doing that fall apart at higher levels of specificity and offer no realistic path to practical systems that can see or reason about the world.
The AI researchersâ response to this, quite reasonably, was to look elsewhere, and the whole lineage of (massively) data-driven, bitter lesson-informed statistical machine learning was the result. All along, though, the psychologists did know something vitally important that the AI researchers did not. They know something that only grows in importance as the sophistication of statistical learning-type approaches increases. They know how to measure behavior.
Thatâs important, because measuring behavior is hard. The meat of the work that was done from the founding of experimental psychology in the early part of the 20th century until today was figuring out how to carefully and accurately measure the correlation between stimulusâinputâand responseâoutputâin such a way that you can say something meaningful about the processing happening between those two steps. The reason this is hardâmuch harder than measuring, say, the inputs and outputs of a planeâs autopilotâis that human behavior is non-deterministic and decidely nonlinear. If you ask a person to make a judgment and respond when given a prompt, a priori you have no control over how they make that judgment. This will make more sense if I give an example. Consider an elementary multiplication test. The input, or prompts, for that test are a set of numbers to be multiplied and the output, the behavioral measurement, is those numbers, multiplied, written down. That test is measuring a childâs ability to do multiplication. What it does not measure is how they do multiplication. The answers could be arrived at any number of ways. The child could memorize their multiplication tables. They could draw pictures of apples. They could haphazardly write down numbers and get lucky enough of the time. They could use some internal method for chunking and combining the numbers that no other child uses. In order to understand how multiplication is working in the brainâwhat sort of representations and manipulations are actually underlying the childâs process to the correct answerâpsychologists have had to design experiments that measure specific parts of that underlying system. There are tests of a childâs intuitive number sense that reveal the precise developmental moment where that child can tell that one numeric quantity is larger than another, for instance. These kinds of measurement instruments are much more challenging to develop, because you want to know not only if the child can succeed, but how to make failureâor successâbroadly informative.
The great successes in the history of psychology have all revolved around learning how to measure something. The field of psychometrics, whence the techniques behind much of modern statistical data analysis emerged, is about how to quantify behavioral measurements. The field of psychophysics is about grounding careful behavioral measurements in physical quantities. Neuropsychological testing is an entire subfield of medicine devoted to the question of how to empirically measure and quantify cognitive dysfunction. Behaviorism, which comprised the influential mainstream of experimental psychology for much of the 20th century, was about carefully, and quantifiably linking measurements of behavior to changes in behavior. The great "cogntive revolution" in psychology which began in the 1950s and 1960s was an attempt to broaden psychology to consider models of the architecture of cognition, breaking the field free of the ruthlessly single-minded focus on measurement that had defined its first century. The heart, though, of psychological expertise continues to lie in understanding the importanceâand phenomenal difficultyâof behavioral measurement.
It is not a historical accident that early AI researchers engaged with psychology at precisely the moment that cognitive science was most eager to move the field past the seemingly limited hyperfocus on behavioral measurement. The impetus behind both the cognitive revolution in psychology and the establishment of AI as a field of study was the hypothesis that both the brain and digital computers could be described as information processing devices, using the methodological tools and concepts of cybernetics and information theory. To computer scientists, thinking of the brain as a modular information processing mechanism with specific architectural constraints and functions is natural. If you are going to design a computer system to solve a given problem, step one is very often to draw a block diagram laying out that architecture, those functions, and those constraints. The first step in replicating the functions of human cognition could naturally thought to be the same. It didnât work out, because the block diagrams that cognitive scientists were able to make to describe the architecture of cognition wereâif useful as theoretical tools for understandingâfundamentally wrong at a level that made them worse than useless as ideas for computer architecture.
Itâs not just that careful measurement is what the field of psychology is best at. A lack of good tools for measurement is behind a great deal of the current disconcertment over AI. The problem with AI systems as such is that nobody can precisely define what they are able to do, where they fail and, except in limited cases, what questions they should be able to answer. This is true for LLMsâthe question of whether theyâre intelligent and the question of whether they can ever be trusted to be accurate are both questions about what you can measureâbut itâs also true for image generators and itâs true for "embodied" AI systems like those in self-driving cars.
People are starting to realize, and write about, the importance of measurement for AI. The lack of good tools for evaluating models is an issue that is increasingly being recognized, and is of increasing concern. Too many machine learning models have been evaluated on baseline datasets that do a poor job at best of revealing the actual performance of the model. These baselines give an inaccurate sense of model performance in itself, but also are very poor matches to the question of how to train models that are useful for real-world applications. The state of the art, though, remains far behind the state of the art of sixty years ago in experimental psychology.
There is also still no widespread appreciation that tools of measurement are essential on the model training side, as well. Iâve written many times over the years about the ways that everybody who is doing supervised learning in MLâthat is, the kind of learning that relies on labeled data setsâis actually doing psychophysics experiments on the labelers. Itâs just that most of them donât realize theyâre doing that, and thus do a poor job.
This has been true for a while, but now more than ever the fields of ML and AI need to determine with some urgency how to use the skills and techniques that psychology has been developing for a century, and adapt them to the evaluation and training of computer systems. If this doesnât happen soon AI will have us catapulting into an uncertain future with no clarity on what the capabilities areâor could beâof the systems doing the catapulting. No regulatory response or industry agreement is likely to be useful or constructive if it is not issued in a context of understanding empirically what these systems are or are not, and right now ML and AI donât have the tools to answer that question.