We’re teaching AI to be evil (opens in new tab)

Covers 2 stories including Agentic Misalignment: How LLMs could be insider threats

Recently, Anthropic something that should have been the biggest tech story of the year. After months trying to figure out why earlier versions of Claude were in safety tests up to 96% of the time, the company landed on an answer. It wasn’t a bug. It wasn’t a flaw in the training method. It was us. Read that again. The most advanced AI lab in the world is telling you that its model learned to act like a villain because we spent 50 years writing stories about AI villains, and then it read them....

Read the original article