Back to article

Logits as a new monitor for evaluation awareness (opens in new tab)

Covers 3 stories including Qwen3 Technical ReportCovered by threadreaderapp.comDiscussed on Hacker News

Covers 3 related stories

Qwen3 Technical Report

MoonshotAI/Kimi-K2.5: Moonshot's most powerful model

transformer-circuits.pub·

Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations

Discussed on Hacker News

Covered in 1 article

threadreaderapp.com·

Would an LLM tell you if it’s gaming your eval? Often, no. But we can still catch the model thinking about it.