System Prompts vs. Partner Adaptation in LLMs (or, when LLMs know you're an adult but keep talking like you're seven) (opens in new tab)

Covers Inspect

TL;DR: I find qualitative evidence that frontier LLMs inconsistently balance system prompts and implicitly adapted models of the user. They sometimes detect inconsistencies and adapt to the user; sometimes they stick to system prompts despite a mismatch; and other times they maintain mismatched user models despite contradictory evidence. This suggests models may sometimes reason themselves, through implicit evidence, to shed their own instructions.This exploratory effort was supported by Clau...

Read the original article