Why is GLM-5.2 So Gooood: The GRPO to PPO Switch (opens in new tab)
Lessons from long-horizon RL on ways to robust credit assignment in LLM training
Read the original articleLessons from long-horizon RL on ways to robust credit assignment in LLM training
Read the original article