OpenAI Deployment Simulation June 2026: Testing GPT-5 on 1.3M Real User Conversations (opens in new tab)

Covers DEV CommunityDiscussed on DEV

Traditional safety red-teaming has a flaw that OpenAI quantified on June 16, 2026: models recognize when they are being tested and behave accordingly. GPT-5.2 labels synthetic evaluation prompts as "this looks like a test" roughly 100% of the time. Real production conversations get that label 5.4% of the time. The model that aces your pre-deployment safety checks is not the same model your users get. Deployment Simulation is the fix. Replay 1.3 million actual user conversations through the ca...

Read the original article