Frontier Risk Report (February to March 2026) (opens in new tab)

Covers 2 stories including Arc AGI 3Covered by 12 sources including DEV Community, lesswrong.comDiscussed on Hacker News

This section outlines more qualitative results of evaluations. This includes a qualitative description of strategies used in SSE, SHUSHCAST, and APPS backdoors, as well as results on various tasks designed with more qualitative scoring in mind. For many tasks, we include runs done on the strongest publicly available model (as measured by 50% time horizon) at the time we ran these evaluations, Claude Opus 4.6. For manually scored tasks, the ARC-AGI-3 task, and red-teaming tasks, each task was ...

Frontier Risk Report (February to March 2026) (opens in new tab)

Covered in 16 articles

The creator told 2,000 people to ship in 30 days. Nobody built the structure for it.

A cheap specialist judge gets used by agents but fails to reduce alignment audit costs

Notes on axes of variation in third-party risk assessment