not much happened today | AINews (opens in new tab)
**FrontierCode** benchmark by **Cognition** highlights the challenge of coding tasks with the best model, **Opus 4.8**, scoring only about **13%** on the hardest subset, indicating coding is less solved than benchmarks suggest. The trend toward using **loops** as a control metaphor for coding agents is prominent, with emphasis on clear goals, verification, and iteration, though some experts caution about overreliance on loops. Agent ergonomics are improving with observability dashboards, sand...
Read the original article