Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning
arxiv.org·2d
💻Local LLMs
Preview
Report Post

View PDF HTML (experimental)

Abstract:Reinforcement learning with verifiable rewards (RLVR) has demonstrated superior performance in enhancing the reasoning capability of large language models (LLMs). However, this accuracy-oriented learning paradigm often suffers from entropy collapse, which reduces policy exploration and limits reasoning capabilities. To address this challenge, we propose an efficient reinforcement learning framework that leverages entropy signals at both the semantic and token levels to improve reasoning. From the data perspective, we introduce semantic entropy-guided curriculum learning, organizing training data from low to high semantic entropy to guide progressive optimization from…

Similar Posts

Loading similar posts...