Hardening against AI takeover is difficult, but we should try
lesswrong.com·23h
Flag this post

Published on November 5, 2025 4:25 PM GMT

This is a commentary on a paper by RAND: Can Humans Devise Practical Safeguards That Are Reliable Against an Artificial Superintelligent Agent? 

Over a decade ago, Eliezer Yudkowsky famously ran the AI box experiment, in which a gatekeeper had to keep a hypothetical ASI, played by him, inside a box, while the ASI tried to convince the gatekeeper to be let loose. In the experiment, the “ASI” often won, which may have helped to convince the emerging AI safety community at the time that constraining an ASI was a dead end and our efforts had to be put into aligning it (which o…

Similar Posts

Loading similar posts...