Published on November 1, 2025 4:39 AM GMT

There’s an argument against the dealmaking agenda that I’ve heard a few times. I will call it the “strategy-stealing” argument. In short: for any honest dealmaking strategy you might pursue, there exists a deceptive alternative that produces identical AI behaviour at lower cost, so you (as a rational actor) will never honestly commit to deals. Moreover, sophisticated AIs, recognising this, won’t believe your deal offers are honest.

In this post, I make the argument more explicit. I’ll probably write a longer post analysing the considerations for and against each assumption, and exploring the conditions under which they’ll hold. But I’m keen to know what people make of the arg…

Published on November 1, 2025 4:39 AM GMT

Strategy-Stealing Story

You work at an AI lab. The safety team suspects that the AI might be misaligned, but wants to deploy it anyway to perform safe and useful work: writing exploit-free code, monitoring other AIs, and automating alignment research. During a meeting, you propose making deals with the AI.

“Let’s offer the AI equity in the company,” you suggest. “We’ll sign a legal contract stating that if a panel of judges determines in, say, 10 years that the AI performed according to our instructions, we’ll honour the commitment and give it the resources it wants.”

The CFO raises an eyebrow.

A colleague responds: “Why actually give it the equity? We control what the AI observes. We can show it fake legal documents, fabricate evidence that the deal is real, train the AI on synthetic documents showing that we have honoured similar agreements in the past. The AI will behave the same way, but we keep the equity. Same behaviour, lower cost. We should obviously do that instead.”

Strategy-Stealing Argument

Setup: You want to motivate an AI to perform certain desired behaviours. You consider your best honest dealmaking strategy D, which involves:

A commitment mechanism (legal contract, smart contract, public pledge, foundation charter, etc.)
A promised benefit (equity, compute, property rights, resources in the far future, etc.)
Desired AI behaviours (acting safely and usefully, revealing misalignment, whistleblowing on a coup attempt, etc.)

The Core Logic:

(1) Indistinguishable Deceptive Alternatives Exist: For every honest dealmaking strategy D, there exists some deceptive alternative D’ that is indistinguishable to the AI from D.

(2) Behavioural Equivalence: Since the AI cannot distinguish D from D’, the AI will behave identically under both strategies.

(3) Cost Asymmetry: For any honest strategy D and its indistinguishable deceptive alternative D’, D’ costs strictly less than D.

(4) Strict Dominance: If D’ produces identical AI behaviour at lower cost, then you should choose D’ over D.

Conclusion: Therefore, your best strategy cannot be honest dealmaking.

Corollary: Suppose the AI is sophisticated enough to follow the reasoning in (1)-(4) and believes you are rational. When the AI observes an apparent deal offer, it will conclude you’re following deceptive strategy D’ and therefore that the commitment won’t be honoured. The AI will behave no better when faced with a non-credible deal than when faced with no deal at all. Thus, even deceptive dealmaking fails to elicit the desired behaviour.

Discuss

Strategy-Stealing Story

Strategy-Stealing Argument

Similar Posts