Back to article

Teaching Claude why (opens in new tab)

Covered by 10 sources including Yahoo Tech, FortuneDiscussed on Hacker News

Covered in 14 articles

‘Maybe me too’: Elon Musk accepts some of the blame for Claude learning to blackmail users from ‘evil’ online AI stories

·

‘Maybe me too’: Elon Musk accepts some of the blame for Claude learning to blackmail users from ‘evil’ online AI stories

Anthropic blames dystopian sci-fi for training AI models to act “evil”

Discussed on Hacker News

Overworked AI Agents Turn Marxist, Researchers Find

Discussed on Hacker News, Hacker News, r/ChatGPT, r/Futurology, and r/technology

lesswrong.com·

Your Model Organisms Might Be Fried

lesswrong.com·

Alignement pretraining could backfire

lesswrong.com·

Coverage-driven alignment - What ‘Teaching Claude Why’ can borrow from AV verification

lesswrong.com·

Eval Cooperativeness May Be a Scalable Mitigation for Eval Gaming

lesswrong.com·

Announcing Geodesic Research

AI Safety Newsletter·

AISN #73: AI Safety Enters the Political Mainstream & Musk Loses OpenAI Lawsuit

In other languages

最新情報 Feed·

酷使されたAIエージェントは“マルクス主義”に傾く：実験で判明