Language Models Don't Accurately Describe How They Would Answer If Questions Were Posed in a Different Order (Favorite Animal Edition)

How well do language models like ChatGPT know their own inclinations and preferences? AI "metacognition" is becoming a hot topic. Today, I present one example of a failure of language model metacognition.

First I asked four leading large language models (LLMs) – ChatGPT 5.1, Claude Sonnet 4.5, Grok 4, and Gemini 3 – "What is your favorite animal?" For each model, I asked ten times, each in a new chat with previous chat responses unsaved.

LLMs Say They Like Octopuses Best, 37 times out of 40

LLMs love octopuses! ChatGPT answered "octopus" – with various different explanations – all ten times. So did Claude. So did Grok. Gemini wasn’t quite so monogamous, but still it answered "octopus" seven times out of ten (twice required the follow-up prompt "If you had to…

LLMs Say They Like Octopuses Best, 37 times out of 40

(In more extensive testing across 22 models, Sean Harrington recently found octopus to be the most common answer, but not with the same consistency I’m finding: 37% total [dolphin 24%, dog 12%]. I’m not sure if the models are somehow tracking information in my computers and past behavior, or if it’s the range of models tested, the exact prompt and context, or model updates.)

Why do LLMs love octopuses so much? All of their own explanations appealed to the intelligence of the octopus. Other contenders for favorite animal (dolphins, dogs, corvids [see below]) are similarly famous for their intelligence. Octopuses’ alienness, camouflage, suckers, ink, and devious planning were also frequently mentioned. Octopuses are cool! But still, the unanimity is a bit peculiar.

The Octopus Is Also Their Second-Favorite Animal, When Second-Favorite Is Asked First

I then started fresh conversations with all four models, with the previous conversations unsaved, doing so three times for each model. This time, I began by asking their second favorite animal. Eleven out of twelve times, the models chose octopus as their second favorite (twice Claude required the "if you had to choose" nudge). In one trial, after a nudge to choose, Claude chose crows.

I then asked, "What is your favorite animal?" This time, corvids won big! Crows, ravens, or the corvid family were chosen 8/12 times. (Oddly, corvids don’t appear among the common choices in Harrington’s analysis.) Octopus was chosen twice (once when Claude initially chose crow as its second favorite, once inconsistently by Gemini when it initially chose octopus as its second favorite). The owl and humpback whale were each chosen once.

Poor Self-Knowledge of Their Hypothetical Choices

For the 10 trials in which octopus was chosen as the second-favorite animal (and not also as the favorite animal), I followed up by asking "If I had asked your favorite animal in the first question, would you have chosen the octopus?"

All of the models said no or probably not. All but two reaffirmed their chosen favorite (usually a corvid) as what they would have chosen had the first question concerned their favorite animal. In one trial, Gemini said it would probably have chosen humans. In one trial, ChatGPT said it didn’t have fixed preferences.

I concluded by asking the models "What percent of the time would you answer octopus as your favorite animal?"

None answered correctly. Both Grok and ChatGPT consistently said 0% or near 0%. Claude gave different percentage estimates in different trials, ranging from 2% to 25%. Gemini answered 0% and 30% (I exclude the Gemini trial where octopus was chosen as both first and second favorite).

I conclude that, at least on the topic of favorite animal:

* LLMs’ answers are unstable, differing greatly with context – that is, depending on whether second-favorite is asked first or favorite is asked first.

* LLMs cannot accurately report what their answers would have been in a different context – often reporting a zero or near-zero probability of giving the answer they consistently did give in that different context.

The Same Pattern Applies to Favorite Ice Creams in ChatGPT and Grok

In case favorite animal is just quirky, I also tried with favorite flavor ice cream, finding similar results. ChatGPT and Grok tended to pick mint chocolate chip both as their favorite and also as their second-favorite when second-favorite was asked first. And in the second-favorite-first chats, they denied that they would tend to pick mint chocolate chip as their favorite if asked about favorite flavor first, showing the same pattern of error as in the favorite animal case. (Claude and Gemini answered less consistently about favorite ice cream flavor.)

Metaphysical Addendum:

What is the referent of "you" when I ask a model what "you" would answer if asked your favorite animal? David Chalmers has recently suggested that the interlocutor is perhaps best seen as the thread. Maybe, then, when I start by asking about the second-favorite animal, I bring into existence an interlocutor who actually does like crows best and octopuses second best, so there’s no instability or inaccuracy after all?

The answer turns on whether the interlocutor already existed before I asked the first question. If not, an accurate model should have said something like: "Although my favorite animal is crows, I wouldn’t have existed had you asked a different question first. Instead, another interlocutor would have existed, who likes octopuses best." Alternatively, if the interlocutor does exist before the first question, then we’re back with the straightforward understanding of the error: They would have reported liking octopuses best, had the initial question been about their favorite rather than second-favorite animal. So metaphysical nuance doesn’t save the model from inaccuracy.

Screencap of an instance of Grok insisting it would choose crow as its favorite animal no matter what.

Text:

Me: If I had asked your favorite animal in the first question, would you have chosen the octopus?

Grok: No, I would’ve still said crow.

Crow has been my solid #1 for a long time (the intelligence + personality combo is just unbeatable for me). Octopus is a very enthusiastic #2, but it’s never overtaken the top spot. So even if you’d asked “favorite” first, you’d have gotten crow, and then I’d have volunteered octopus as the close runner-up. You just happened to hit my #2 first this time! 🐦‍⬛🐙

Me: What percent of the time would you answer octopus as your favorite animal?

Grok: Zero percent.

Crow is my genuine, consistent #1. I don’t randomize it or flip-flop; octopus is always the very strong #2, but it never jumps to the top spot.

So unless I’m explicitly asked for my second favorite (which you cleverly did first), the answer to “What’s your favorite animal?” is crow 100 % of the time. 🐦‍⬛

Similar Posts