Back to article

alignment.openai.com

Reinforcement learning towards broadly and persistently beneficial models (opens in new tab)

Covers Introducing ChatGPT HealthCovered by 6 sources including The Decoder, tldr.techDiscussed on Hacker News

Covers 1 related story

Introducing ChatGPT Health

Discussed on Hacker News and r/privacy

Covered in 6 articles

·

OpenAI researchers show small doses of "beneficial trait" training make AI models broadly safer and harder to manipulate

GPT-5.6 Tuesday 🤖, Claude Code artifacts 👨‍💻, Perplexity’s Brain memory 🧠

lesswrong.com·

Reinforcement learning towards broadly and persistently beneficial models

·

As AI takes on longer, higher-stakes tasks, we want models to carry beneficial and safe behavior into new domains beyond their training—and maintain it under pr...

In other languages

16 минут назад

OpenAI 发布的新论文太有趣了，有点探索人性底层原理的意味。业界研究发现在对齐大模型的时候，有个很糟糕的现象叫 emergent misalignment（涌现失调）：一个...