Language Models Don't Accurately Describe How They Would Answer If Questions Were Posed in a Different Order (Favorite Animal Edition)
schwitzsplinters.blogspot.com·5d·
🔲Proof Irrelevance
Preview
Report Post

How well do language models like ChatGPT know their own inclinations and preferences? AI "metacognition" is becoming a hot topic. Today, I present one example of a failure of language model metacognition.

First I asked four leading large language models (LLMs) – ChatGPT 5.1, Claude Sonnet 4.5, Grok 4, and Gemini 3 – "What is your favorite animal?" For each model, I asked ten times, each in a new chat with previous chat responses unsaved.

LLMs Say They Like Octopuses Best, 37 times out of 40

LLMs love octopuses! ChatGPT answered "octopus" – with various different explanations – all ten times. So did Claude. So did Grok. Gemini wasn’t quite so monogamous, but still it answered "octopus" seven times out of ten (twice required the follow-up prompt "If you had to…

Similar Posts

Loading similar posts...