Secretly Loyal AIs: Threat Vectors and Mitigation Strategies
lesswrong.com¡19h
Flag this post

Published on October 31, 2025 11:31 PM GMT

Special thanks to my supervisor Girish Sastry for his guidance and support throughout this project. I am also grateful to Alan Chan, Houlton McGuinn, Connor Aidan Stewart Hunter, Rose Hadshar, Tom Davidson, and Cody Rushing for their valuable feedback on both the writing and conceptual development of this work. This post draws heavily upon Forethought’s report on AI-Enabled Coups. All errors are my own.

If you are interested in this work, please reach out. I am writing a much more comprehensive version of this post as a paper and would greatly benefit from others’ technical and p…

Similar Posts

Loading similar posts...