Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention
arxiv.org·3d
🔲Cellular Automata
Preview
Report Post

View PDF HTML (experimental)

Abstract:Recently, reinforcement learning (RL) has become a common choice in enhancing the reasoning capabilities of vision-language models (VLMs). Considering existing RL- based finetuning methods, entropy intervention turns out to be an effective way to benefit exploratory ability, thereby improving policy performance. Notably, most existing stud- ies intervene in entropy by simply controlling the update of specific tokens during policy optimization of RL. They ig- nore the entropy intervention during the RL sampling that can boost the performance of GRPO by improving the di- versity of responses. In this paper, we propose Selective- adversarial Entropy Intervention, namel…

Similar Posts

Loading similar posts...