Reward hacking in Reinforcement learning (opens in new tab)
A field guide to reward hacking in GRPO — why it happens, how it hides, and what actually fixes it
Read the original articleA field guide to reward hacking in GRPO — why it happens, how it hides, and what actually fixes it
Read the original article