Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning (opens in new tab) 🎯Reinforcement Learning Content type: Academic

arxiv.org··Covered by ai-brief.liziran.com·Open original

Expressive continuous control policies, such as diffusion and flow models, form the backbone of recent advances in scaling imitation learning for simulated and real robot control. While they are known to scale stably in the supervised imitation learning setting, incorporating them into reinforcement learning (RL) pipelines for policy improvement has proven more difficult. It often requires specialized training objectives or backpropagating thr...

Read the original article

Sign in to keep reading the full article.

Sign Up Log In

Cited by 1 article

In other languages

一条证据压成1个token，生成省3-10倍

ai-brief.liziran.com·