A decision theorist walks into a seminar

This is Jessica. Recently overheard (more or less): SPEAKER: We study decision making by LLMs, giving them a series of medical decision tasks. Our first step is to infer, from their reported beliefs and decisions, the utility function under revealed preference assump— AUDIENCE: Beliefs!? Why must you use the word beliefs? SPEAKER [caught off guard] : Umm… because we are studying how the models make decisions, and beliefs help us infer the scoring rule corresponding to what they give us. AUDIENCE: But it’s not clear language models have beliefs like people do. SPEAKER: Ok. I get it. But, it’s also not clear what people’s beliefs are exactly or that they’re consistent. There’s a large body of research on how the beliefs you get are affected by the method you use. So there’s no reason to think that human beliefs are stable, and what people report as their beliefs does not necessarily explain their decisions by common models. AUDIENCE: But people can believe things. Models are just patterns of activation. SPEAKER: Ok. Well, perhaps we can just call them subjective distributions. AUDIENCE: But subjective implies a person, experiencing something. We can’t establish that they have subjective experience. INFORMED AUDIENCE MEMBER: Wait a second, when you say beliefs do you just mean a probability distribution? Like in decision theory? SPEAKER: Yes. We can just call it a probability distribution to avoid the term beliefs. STUBBORN AUDIENCE MEMBER: I don’t know. They may not act like real probabilities. SPEAKER: But often the probabilities people give us don’t conform to probability axioms either. But ok, fine. How about we say belief-like representation? AUDIENCE: What is that supposed to mean? SPEAKER: Well, we could look for the same kinds of properties we hope to see in human beliefs, even if we can’t elicit them perfectly. Like, we assume that beliefs should have some correspondence to behavior, so we could look for that. Which is part of what we do in the work I’m going to talk– AUDIENCE: Now you’ve really lost us. SPEAKER: Ok. How about “risk representations” then? AUDIENCE: Risk! Models don’t feel things! SPEAKER: Ok. Pseudo-probability representations? STUBBORN AUDIENCE MEMBER: I don’t know how I feel about the word representation here. Does representation imply intent? LLMs don’t have intentions. SPEAKER: Ok. Vectors. Pseudo-probability vectors. Anyway, we were interested in seeing what happens when you apply revealed preference assumptions to LLMs, to infer the scoring rule they are using… AUDIENCE: Scoring rule! They can’t respond to incentives! SPEAKER: They are trained with scoring rules, like cross entropy loss. But then fine-tuning like SFT and RLHF induce something different. Plus when we prompt them with a specific decision context, we will induce a different posterior distribution. So it makes sense to– MODERATOR [ holding up their hand ]: Five minutes left.

Loading more...