Opus 4.8 vs Opus 4.7 vs GPT-5.5 vs Composer 2.5 - 50 Real PRs in Go and Rust (opens in new tab)

Covers DeepSWE BenchmarkDiscussed on Hacker News

I graded four frontier coding models on 50 real merged PRs in Go and Rust - not just whether tests pass, but craft, equivalence, and cost. Opus 4.8 led on craft in both.

Read the original article