Back to article

Alignment faking in large language models (opens in new tab)

Covered by 4 sources including lesswrong.com, data4sci.substack.comDiscussed on Hacker News and Hacker News

Covered in 7 articles

lesswrong.com·

Neglected Basics of AI Alignment

lesswrong.com·

Rohin Shah on AGI Safety

lesswrong.com·

What should go in a model spec?

lesswrong.com·

How much should we worry about secretly loyal AIs?

data4sci.substack.com·

Building a Basic Agentic Harness

Discussed on Substack

multipliercg.substack.com·

A personal letter on transformative AI

Discussed on Substack

the-substrate.net·

How much should we worry about secretly loyal AIs?

Discussed on Hacker News