I am going to argue that we will likely eventually get AIs that are strongly power-seeking, much more so than current SOTA LLMs.[1]TLDRRight now SOTA LLMs are still largely in a simulator regime. This buffers against power-seeking.Long-horizon RL or similar methods (applied to LLMs or otherwise) will turn AIs into consequentialists, motivating power-seeking.It will likely be difficult to prevent other actors from building consequentialist AI without leading labs being prepared to do so themse...

Read the original article