WAM-RL: World-Action Model Reinforcement Learning with Reconstruction Rewards and Online Video SFT (opens in new tab)

Recent World-Action (WA) models demonstrate strong generalization ability and data efficiency, but they typically rely on expert trajectories for training. This reliance limits their ability to acquire fine-grained manipulation skills beyond the demonstration distribution and prevents them from continuously improving through real-world interaction. To address these limitations, we propose WAM-RL, a reinforcement learning framework that enables j...

Read the original article