Published on November 1, 2025 10:36 PM GMT
Newcomb’s problem is a famous paradox in decision theory. The simple version is as follows:
Two boxes are designated A and B. The player is given a choice between taking only box B or taking both boxes A and B. The player knows the following:
- Box A is transparent and always contains a visible $1,000.
- Box B is opaque and its content has already been set by the predictor:
- If the predictor has predicted that the player will take both boxes A and B, then box B contains nothing.
- If the predictor has predicted that the player will take only box B, then box B contains $1,000,000.
The player does not …
Published on November 1, 2025 10:36 PM GMT
Newcomb’s problem is a famous paradox in decision theory. The simple version is as follows:
Two boxes are designated A and B. The player is given a choice between taking only box B or taking both boxes A and B. The player knows the following:
- Box A is transparent and always contains a visible $1,000.
- Box B is opaque and its content has already been set by the predictor:
- If the predictor has predicted that the player will take both boxes A and B, then box B contains nothing.
- If the predictor has predicted that the player will take only box B, then box B contains $1,000,000.
The player does not know what the predictor predicted or what box B contains while making the choice.
The argument in favor of “one-boxing” is that one-boxers systematically get more money—only one-boxers get the $1,000,000.
The argument in favor of “two-boxing” is that, at the point in time when you’re faced with the choice, nothing you can do will change the amount of money in the boxes. The predictor has already put the money in. And so taking both must be strictly better.
My claim is that the fundamental paradox here is that the existence of the predictor is incompatible with free will. If a predictor can allocate money based on your predicted future actions, you don’t have complete freedom of choice, which causes confusion when thinking about making free decisions.
Newcomb’s Problem only looks like a paradox because people are trying to insert the pretense of free choice into a setup that denies that freedom. Perfect (or even robustly better-than-random) prediction of your act means you don’t have freedom over that act. Therefore, asking “What should I choose in Newcomb’s Problem right now?” is confused in the same way “What should the output bit of this already-wired circuit choose at timestamp t?” is confused.
A more general claim is: if something can predict your action with better than random accuracy no matter how hard you try to prevent them, you don’t have free will over that action. (I’m not addressing the question of whether free will exists in general, only whether a particular action is chosen freely.)
Some may ask: what if someone can predict your actions the second before you take them using a brain-scanning device? I think that’s irrelevant. For decision/game theory problems, it makes sense to discretize time. Each time step is a point when at least one of the agents can make a decision. If an agent has information about another agent’s future decision no matter what the other agent’s strategy is, the other agent is not acting freely. In the case of the brain-scanner, I’d say “the second before” just reflects the fact that taking an action in the physical world takes >0 time, and once you’ve made the decision to start the action you are no longer free to reverse it. Further, for the purposes of what I’m saying, it’s not relevant whether free will generally exists or not, but rather only whether, in a particular situation, you’re freely choosing between some number of options.
A couple examples of the principle:
- I predict you’ll get out of bed tomorrow morning, and you actually do.
- This doesn’t prove that you didn’t choose to get out of bed freely since you were not trying to make my prediction wrong.
- I bet you $100,000 that I can predict the day before whether you’ll get out of the left or right side of the bed each day for the next month with >75% accuracy. I succeed.
- This does prove that you weren’t freely choosing what side to get out of bed, because even with $100,000 on the line, you weren’t able to randomize your actions such that I couldn’t predict them.
Or consider the following problem:
You’ve been sorted into a group of people based on your personality, which was determined by an algorithm that observed your behavior for many years. The algorithm either identified you as a “cooperator”, in which case it put you in a group with other cooperators, or as a “defector”, in which case it put you in a group with other defectors[1]. The algorithm is known to be very accurate. Inside the group, a Prisoner’s Dilemma is arranged. Without talking, you must decide to either “cooperate” or “defect”. If everyone in the group cooperates, you all win $100. If some but not all people defect, anyone who didn’t defect gets nothing, and the defectors each get $200. If everyone defects, no-one gets any money at all.
This problem is similar to Newcomb’s problem in that if you think only about the causal impact of your decision, you should always defect. Defecting gets you strictly more money—either you’ve been sorted into the defectors group and so you get no money either way (because everyone else will defect), or you’ve been sorted into the cooperators group, and so if you defect you’ll get $200 instead of $100.
The argument in favor of cooperating is that it’s clearly better to be a cooperator. The cooperator groups get $100 each, while the defector groups get nothing.
But the issue here is that the whole setup presumes that you can’t systematically fool the algorithm. If you had free will, your choice to defect or cooperate in the scenario could be completely decoupled from your prior behavior. Thus, in the moment, the possibilities would be as follows:
- Algorithm predicted you will cooperate → you can freely “just decide” to defect even though your past behavior displayed you as a cooperator
- Algorithm predicted you will defect → your actions don’t matter anyway, you may as well defect
As soon as you say something like “the algorithm would have predicted you’d change your tendency in the moment” you are positing a world where people cannot change what type of person they are in the moment; a world without free choice. For in a world with free choice (for these actions), you can be a perfect cooperator type until the last minute and then switch to become a defector.
Of course, in a world full of utility-maximizing freely-choosing agents, the algorithm would be best off predicting that everyone is a defector, which is not good for you. This leads us to the topic of pre-commitment.
Some people’s answer to Newcomb’s Problem is that:
- It’s rational to two-box if you’re just put in the scenario the normal way,
- But it’s best if you can pre-commit to one-boxing before the scenario, i.e. somehow limit your options to only one-boxing, such that in the moment you one-box and get the higher payoff.
In other words, it’s “just a standard time consistency problem”, as Basil Halperin writes. Quoting from Basil’s post:
So to summarize, what’s the answer to, “Should you one-box or two-box?”?
The answer is, it depends on from which point in time you are making your decision. In the moment: you should two-box. But if you’re deciding beforehand and able to commit, you should commit to one-boxing.
How does this work out in real life? In real life, you should – right now, literally right now – commit to being the type of person who if ever placed in this situation would only take the 1-box. Impose a moral code on yourself, or something, to serve as a commitment device. So that if anyone ever comes to you with such a prediction machine, you can become a millionaire 😊.
This is of course what’s known as the problem of “time consistency”: what you want to do in the moment of choice is different from what you-five-minutes-ago would have preferred your future self to do. Another example would be that I’d prefer future-me to only eat half a cookie, but if you were to put a cookie in front of me, sorry past-me but I’m going to eat the whole thing.
Thus my claim: Newcomb merely highlights the issue of time consistency.
So why does Newcomb’s problem produce so much confusion? When describing the problem, people typically conflate and confuse the two different points in time from which the problem can be considered. In the way the problem is often described, people are – implicitly, accidentally – jumping between the two different points of view, from the two different points in time. You need to separate the two possibilities and consider them separately. I have some examples in the appendix at the bottom of this type of conflation.
The only problem with this line of thinking is that true pre-commitment is very difficult. I agree that one should pre-commit to one-boxing, if such an option is available, but how? Simply saying “I pre-commit” doesn’t work. Furthermore, if you successfully pre-committed to one-boxing, there’s no choice to make. And so it’s no longer a “decision problem” in the intuitive sense.
Some claim that the correct strategy is to consistently do things they would have pre-committed to, so that they are treated and modeled by others as cooperators/one-boxers. However, this only makes sense under two conditions:
- Their cooperative actions directly cause desirable outcomes by making observers think they are trustworthy/cooperative.
- Being deceptive is too costly, either because it’s literally difficult (requires too much planning/thought), or because it makes future deception impossible (e.g. because of reputation and repeated interactions).
Of course, whether or not we have some free will, we are not entirely free—some actions are outside of our capability. Being sufficiently good at deception may be one of these. Hence why one might rationally decide to always be honest and cooperative—successfully only pretending to be so when others are watching might be literally impossible (and messing up once might be very costly).
The notion of pre-commitment also highlights how free will is central to the decision to one- or two-box.
You could split the problem into “what happens if you have free will” vs. “what happens if you don’t have free will”:
- Free will
- You should two-box, because either the predictor thought you’d one-box, and you can just trick them and two-box anyway, or they thought you’d two-box so you may as well two-box.
- No free will
- There’s no decision to make. Either you’re the type of guy who one-boxes (e.g. because you successfully pre-committed and cut off the other option), or you’re the type of guy who two-boxes. Of course you should be glad to find out that you successfully cut off your options and can only one-box (e.g. because you’re magically injured and can’t reach the second box).
Michael Huemer writes about why two-boxing is correct in his post “The Solution to Newcomb’s Paradox”.
having a goal does not (intrinsically) give you a reason to give yourself evidence that the goal will be satisfied; it gives you a reason to cause the goal to be satisfied.
Since you cannot change the past, the correct EU calculation is to treat the past as fixed, and calculate EU given each possible past state of the world. Then take an average of these EU values, weighted by your credence in each possible past state. This way of doing the calculation necessarily preserves the results of dominance reasoning.
The only issue is that, as soon as we speak of “causing a goal to be satisfied”, we’re presuming freedom of choice. Whereas Newcomb’s problem, as generally posed, assumes you are not free.
The stubborn Rationalist repeats but one-boxing will leave me richer, and so I will choose to one-box. But you can’t change the payouts in the moment. Insofar as you can choose anything, you’re only choosing between X and X + 1000. (I’m talking about the instantaneous Newcomb’s problem, i.e. you are deciding now, not “what would you decide in advance” or “what would you commit to”.) If you’re simply saying “Because of my nature, I have already successfully pre-committed to one-boxing” then you are saying “I don’t have free will, I have to one-box, because I’ve cut off my other option”. And this is valid, but then you’re not “choosing”. (And please explain how you’ve committed.)
In conclusion, if you find yourself freely choosing between options, it’s rational to take a dominating strategy, like two-boxing in Newcomb’s problem, or defecting in the sorted prisoner’s dilemma. However, given the opportunity to actually pre-commit to decisions that get you better outcomes provided your pre-commitment, you should do so. The most interesting thing about Newcomb’s problem is that it demonstrates that the capacity to make decisions is sometimes disadvantageous for future situations. You don’t have free will in Newcomb’s problem, so you better hope you’re destined to one-box. But if you do have free will (for example, because they lied to you about Omega, or because you’re the only guy in the universe with free will), you may as well choose to get an extra $1000!
- ^
The same scenario can be described as “you were put in a group with people who have a correlated personality type”.
Discuss