Flow-OPD: On-Policy Distillation for Flow Matching Models (opens in new tab)

Covered by 3 sources including DEV Community, ai-brief.liziran.com

Existing Flow Matching (FM) text-to-image models suffer from two critical bottlenecks under multi-task alignment: the reward sparsity induced by scalar-valued rewards, and the gradient interference arising from jointly optimizing heterogeneous objectives, which together give rise to a 'seesaw effect' of competing metrics and pervasive reward hacking. Inspired by the success of On-Policy Distillation (OPD) in the large language model community,...

Read the original article

Sign in to keep reading the full article.

Sign Up Log In