Low-Temperature Evaluations Can Mask Critical AI Behaviors
lesswrong.com·7h
Flag this post

Published on November 13, 2025 8:12 PM GMT

TL;DR: AI models show dramatically different ethical behavior at different temperature settings. Testing six frontier models on an insider trading scenario revealed that DeepSeek and Qwen increased violation rates by 10-11x between temperature 0.0 and 0.7 (from ~3-5% to ~30-50%), while Mistral showed high violation rates that decreased with increased temperature. Claude and GPT models categorically refused across all settings. This means evaluations at any single temperature fail to capture the full range of behavior these models exhibit. Since different applications deploy models at different temperatures, we need evaluation approaches that test across the full range rather than assuming a single configuration reve…

Similar Posts

Loading similar posts...