Alignment faking in large language models (opens in new tab)

Covered by 4 sources including lesswrong.com, data4sci.substack.comDiscussed on Hacker News and Hacker News

Sign in to keep reading the full article.

Covered in 7 articles

lesswrong.com·

Neglected Basics of AI Alignment

lesswrong.com·

Rohin Shah on AGI Safety

lesswrong.com·

What should go in a model spec?