Rubric-Based Rewards for RL (opens in new tab)
Extending the benefits of large-scale RL training to non-verifiable domains...
Read the original articleExtending the benefits of large-scale RL training to non-verifiable domains...
Read the original article