Introducing MonitoringBench (opens in new tab)
Paper, .Authors: , We are releasing MonitoringBench, a difficulty-graded benchmark of 2,644 successful attack trajectories for evaluating coding-agent monitors, alongside the semi-automated red-teaming pipeline we used to generate it. The pipeline decomposes attack construction into strategy generation, execution, and post-hoc refinement, and produces substantially harder attacks than prompt elicitation only of frontier attackers.If you would like to use MonitoringBench to evaluate your monit...
Read the original article