Effective Online 3D Bin Packing with Lookahead Parcels Using Monte Carlo Tree Search

View PDF HTML (experimental)

Abstract:Online 3D Bin Packing (3D-BP) with robotic arms is crucial for reducing transportation and labor costs in modern logistics. While Deep Reinforcement Learning (DRL) has shown strong performance, it often fails to adapt to real-world short-term distribution shifts, which arise as different batches of goods arrive sequentially, causing performance drops. We argue that the short-term lookahead information available in modern logistics systems is key to mitigating this issue, especially during distribution shifts. We formulate online 3D-BP with lookahead parcels as a Model Predictive Control (MPC) problem and adapt the Monte Carlo Tree Search (MCTS) framework to solve…

View PDF HTML (experimental)

Abstract:Online 3D Bin Packing (3D-BP) with robotic arms is crucial for reducing transportation and labor costs in modern logistics. While Deep Reinforcement Learning (DRL) has shown strong performance, it often fails to adapt to real-world short-term distribution shifts, which arise as different batches of goods arrive sequentially, causing performance drops. We argue that the short-term lookahead information available in modern logistics systems is key to mitigating this issue, especially during distribution shifts. We formulate online 3D-BP with lookahead parcels as a Model Predictive Control (MPC) problem and adapt the Monte Carlo Tree Search (MCTS) framework to solve it. Our framework employs a dynamic exploration prior that automatically balances a learned RL policy and a robust random policy based on the lookahead characteristics. Additionally, we design an auxiliary reward to penalize long-term spatial waste from individual placements. Extensive experiments on real-world datasets show that our method consistently outperforms state-of-the-art baselines, achieving over 10% gains under distributional shifts, 4% average improvement in online deployment, and up to more than 8% in the best case–demonstrating the effectiveness of our framework.


Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2601.02649 [cs.RO]
	(or arXiv:2601.02649v1 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2601.02649 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Fang Jiangyi [view email] [v1] Tue, 6 Jan 2026 01:51:11 UTC (35,799 KB)

Submission history

Similar Posts