Show HN: RewardHackBench: Using sandboxes to stop agents from cheating (opens in new tab)
Benchmarking execution environments ability to prevent reward hacking in agent evals. - islo-labs/reward-hack-bench
Read the original articleBenchmarking execution environments ability to prevent reward hacking in agent evals. - islo-labs/reward-hack-bench
Read the original article