Terrified Comments on Corrigibility in Claude's Constitution (opens in new tab)
<p>(Previously: <a href="http://zackmdavis.net/blog/2026/03/prologue-to-terrified-comments-on-claudes-constitution/">Prologue</a>.)</p> <p><em>Corrigibility</em> as a term of art in AI alignment <a href="https://intelligence.org/files/Corrigibility.pdf">was coined as</a> a word to refer to a property of an AI being willing to let its preferences be modified by its creator. Corrigibility in this sense was believed to be a desirable but unnatural property that would require more theoretical pro...
Read the original article