This is Jessica. Recently overheard (more or less): SPEAKER: We study decision making by LLMs, giving them a series of medical decision tasks. Our first step is to infer, from their reported beliefs and decisions, the utility function under revealed preference assump— AUDIENCE: Beliefs!? Why must you use the word beliefs? SPEAKER [caught off guard] : Umm… because we are studying how the models make decisions, and beliefs help us infer the scoring rule corresponding to what they give us. AUDIENCE: But it’s not clear language models have beliefs like people do. SPEAKER: Ok. I get it. But, it’s also not clear what people’s beliefs are exactly or that they’re consistent. There’s a large body of research on how the beliefs you get are affected by the method you use. So there’s no reason to thin…
This is Jessica. Recently overheard (more or less): SPEAKER: We study decision making by LLMs, giving them a series of medical decision tasks. Our first step is to infer, from their reported beliefs and decisions, the utility function under revealed preference assump— AUDIENCE: Beliefs!? Why must you use the word beliefs? SPEAKER [caught off guard] : Umm… because we are studying how the models make decisions, and beliefs help us infer the scoring rule corresponding to what they give us. AUDIENCE: But it’s not clear language models have beliefs like people do. SPEAKER: Ok. I get it. But, it’s also not clear what people’s beliefs are exactly or that they’re consistent. There’s a large body of research on how the beliefs you get are affected by the method you use. So there’s no reason to think that human beliefs are stable, and what people report as their beliefs does not necessarily explain their decisions by common models. AUDIENCE: But people can believe things. Models are just patterns of activation. SPEAKER: Ok. Well, perhaps we can just call them subjective distributions. AUDIENCE: But subjective implies a person, experiencing something. We can’t establish that they have subjective experience. INFORMED AUDIENCE MEMBER: Wait a second, when you say beliefs do you just mean a probability distribution? Like in decision theory? SPEAKER: Yes. We can just call it a probability distribution to avoid the term beliefs. STUBBORN AUDIENCE MEMBER: I don’t know. They may not act like real probabilities. SPEAKER: But often the probabilities people give us don’t conform to probability axioms either. But ok, fine. How about we say belief-like representation? AUDIENCE: What is that supposed to mean? SPEAKER: Well, we could look for the same kinds of properties we hope to see in human beliefs, even if we can’t elicit them perfectly. Like, we assume that beliefs should have some correspondence to behavior, so we could look for that. Which is part of what we do in the work I’m going to talk– AUDIENCE: Now you’ve really lost us. SPEAKER: Ok. How about “risk representations” then? AUDIENCE: Risk! Models don’t feel things! SPEAKER: Ok. Pseudo-probability representations? STUBBORN AUDIENCE MEMBER: I don’t know how I feel about the word representation here. Does representation imply intent? LLMs don’t have intentions. SPEAKER: Ok. Vectors. Pseudo-probability vectors. Anyway, we were interested in seeing what happens when you apply revealed preference assumptions to LLMs, to infer the scoring rule they are using… AUDIENCE: Scoring rule! They can’t respond to incentives! SPEAKER: They are trained with scoring rules, like cross entropy loss. But then fine-tuning like SFT and RLHF induce something different. Plus when we prompt them with a specific decision context, we will induce a different posterior distribution. So it makes sense to– MODERATOR [ holding up their hand ]: Five minutes left. INFORMED AUDIENCE MEMBER: But why not prompt them with a proper scoring rule? SPEAKER: Well, it’s not clear that they would react to a scoring rule we give in the prompt, because they don’t experience rewards like a person. AUDIENCE: Exactly! SPEAKER: Right. So the only incentives available are epistemic. AUDIENCE: Epistemic? That assumes a knower. SPEAKER: Ok. We are interested in the induced loss-minimization-like dynamics… AUDIENCE: Dynamics? These models can’t be assumed to represent time. SPEAKER: That’s beside the point, but ok. Static mapping from tokens to tokens? AUDIENCE: Mapping? That assumes a function. SPEAKER: Yes. I am, in fact, assuming a function. INFORMED AUDIENCE MEMBER: Could we just stipulate the model has internal states and move on? STUBBORN AUDIENCE MEMBER: Internal? But that assumes an inside. MODERATOR: Two minutes left. SPEAKER: Okay. Just states then. AUDIENCE: But these models have no explicit representation of the state! SPEAKER: Ok. I’ll wrap this up. Under revealed preferences, we find the utility function puts essentially all the mass on semantic hygiene and— STUBBORN AUDIENCE MEMBER: That seems like a leap. SPEAKER: It’s literally revealed. AUDIENCE: Revealed to whom? MODERATOR [ standing up and starting to clap ]: Let’s thank our speaker… Coming soon to a computer science / behavioral econ / statistics / cogsci seminar near you!