Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning
arxiv.org·2h
Incremental Computation
Preview
Report Post

View PDF HTML (experimental)

Abstract:Recent vision-language models (VLMs) achieve remarkable reasoning through reinforcement learning (RL), which provides a feasible solution for realizing continuous self-evolving large vision-language models (LVLMs) in the era of experience. However, RL for VLMs requires abundant high-quality multimodal data, especially challenging in specialized domains like chemistry, earth sciences, and multimodal mathematics. Existing strategies such as synthetic data and self-rewarding mechanisms suffer from limited distributions and alignment difficulties, ultimately causing reward hacking: models exploit high-reward patterns, collapsing policy entropy and destabilizing training…

Similar Posts

Loading similar posts...