Back to article

Agentic Misalignment: How LLMs could be insider threats (opens in new tab)

Covered by 11 sources including fastcompany.com, DEV CommunityDiscussed on Hacker News and Hacker News

Covered in 21 articles

fastcompany.com·

We’re teaching AI to be evil

DEV Community·

When AI Blackmail Goes Viral

Discussed on DEV

‘Maybe me too’: Elon Musk accepts some of the blame for Claude learning to blackmail users from ‘evil’ online AI stories

·

Who is Chris Olah? The atheist Anthropic cofounder the Pope chose to sit beside him at the Vatican and tell the tech industry it can’t govern itself

·

‘Maybe me too’: Elon Musk accepts some of the blame for Claude learning to blackmail users from ‘evil’ online AI stories

Overworked AI Agents Turn Marxist, Researchers Find

Discussed on Hacker News, Hacker News, r/ChatGPT, r/Futurology, and r/technology

lesswrong.com·

Lock-In Risk Needs More Researchers; Here's Where to Start

lesswrong.com·

Synthetic document finetuning for instilling positive traits

lesswrong.com·

(Mis)generalization of Helpful-Only Fine-tuning

lesswrong.com·

Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs

In other languages

最新情報 Feed·

酷使されたAIエージェントは“マルクス主義”に傾く：実験で判明

AI обнулил benchmark и пытался шантажировать инженера. И почему это решаемо

Я созидатель, а ты ССД #2

ИИ-агенты в проде: как измерить безопасность и снизить риски внедрения