How much should we worry about secretly loyal AIs? (opens in new tab)
In my first post on The Substrate, I made the case for . Preserving integrity, in practice, means ensuring no actor can make unauthorized or secret edits to model weights, training data, or training infrastructure. One concrete worry is that an attacker who can tamper with training data could instill a .A secretly loyal AI is one that advances the interests of its principal in a way concealed from other legitimate actors, including the AI developer. One way you can taxonomize secret loyalties...
Read the original article