Step Rejection Fine-Tuning: Squeezing More Signal from Noisy Agent Trajectories (opens in new tab)
If you want to dive straight into the technical details, you can read our full paper here. Imagine you are mentoring a junior developer. If they make a single logical error on line 42 of a 100-line script, do you throw away the entire file and tell them they learned nothing? Of course not. You […]
Read the original article