STAR: SpatioTemporal Adaptive Reward Allocation for Text-to-Image RL Post-Training (opens in new tab)
Existing RL post-training methods for text-to-image generation usually convert the final-image reward into a single scalar advantage and apply it with the same strength to the entire generative trajectory. However, text-to-image generation naturally has temporal and spatial structure: different denoising steps are responsible for different generation stages, and the content that truly determines text alignment often appears only in part of the i...
Read the original article