Simulated deployments also reduced evaluation awareness to levels close to real production traffic. (opens in new tab)
Simulated deployments also reduced evaluation awareness to levels close to real production traffic. We extended the method to agentic deployments with stateful tools, showing that tool simulators can produce realistic trajectories when given sufficient context and capabilities.
Read the original article