Scalable Maximum Entropy Reinforcement Learning for Diffusion Policies via Adjoint Matching (opens in new tab)

Diffusion policies have recently emerged as a powerful paradigm for representing complex action distributions in reinforcement learning (RL). However, their application to online RL remains limited by the challenge of scalable training in the absence of ground-truth data, where standard optimization techniques such as score matching are not directly applicable. In this work, we introduce a highly efficient algorithm for optimizing diffusion poli...

Read the original article