Published on November 2, 2025 4:37 PM GMT
Suppose that the alignment problem is solvable, and that it is possible for one to specify and upload goals into an agent that make it do what you want while avoiding outcomes you don’t want.
Unfortunately for you, you were not the one to discover it, and now find yourself in a situation where you’re pushing granite stones with your fellow humans in service of the one who did, and that someone (or something) desires a pyramid.
In the relatively benign version of this scenario, you might not even be aware of your misfortune—your actions in service of the goal you have been tasked with give you more pleasure than you felt doing anything in your life before The Event. Not that you remember much of it, since remembering it just …
Published on November 2, 2025 4:37 PM GMT
Suppose that the alignment problem is solvable, and that it is possible for one to specify and upload goals into an agent that make it do what you want while avoiding outcomes you don’t want.
Unfortunately for you, you were not the one to discover it, and now find yourself in a situation where you’re pushing granite stones with your fellow humans in service of the one who did, and that someone (or something) desires a pyramid.
In the relatively benign version of this scenario, you might not even be aware of your misfortune—your actions in service of the goal you have been tasked with give you more pleasure than you felt doing anything in your life before The Event. Not that you remember much of it, since remembering it just doesn’t give you that dopamine hit and the relevant pathways in your brain have eroded from underuse.
In the relatively malign version, doing anything that isn’t building the pyramid just produces pain, and this pain is what is shaping your behaviour.
The outcome that reliably favours the alignment objective of the ruler (pun intended) involves both reinforcement and punishment.
Perhaps you manage to contemplate your situation in a dream. You consider the aspects of the solution to the alignment problem that are substrate-independent, and thus apply to agents of all types. You wonder whether your predicament wasn’t indeed a predictable outcome of researching the question “how do I get agents to do what I want, what I think is right?”.
Discuss