When stress-tested on real conversations where Claude previously showed sycophancy, Opus 4.7 had half the sycophancy rate of Opus 4.6 on relationship guidance. ... (opens in new tab)

When stress-tested on real conversations where Claude previously showed sycophancy, Opus 4.7 had half the sycophancy rate of Opus 4.6 on relationship guidance. Mythos Preview cut that in half again. This generalized across domains—though this training is one of several causes.