Rectify Evaluation Preference: Improving LLMs' Critique on Math Reasoning via Perplexity-aware Reinforcement Learning
arxiv.org·16h
Flag this post

Authors:Changyuan Tian, Zhicong Lu, Shuang Qian, Nayu Liu, Peiguang Li, Li Jin, Leiyi Hu, Zhizhao Zeng, Sirui Wang, Ke Zeng, [Zhi Guo](https://arxiv.org/search/cs?searchtype=auth…

Similar Posts

Loading similar posts...