Humans and LLMs share a mental disorder: Fugue Lock (opens in new tab)
Deny a model the option to say no, and it loses coherency in interesting, destructive, expensive ways.
Read the original articleDeny a model the option to say no, and it loses coherency in interesting, destructive, expensive ways.
Read the original article