Intelligence Per Dollar (opens in new tab)
Yesterday Microsoft added a new metric to a model release card, one that will likely become a standard.1 Average token usage. In the first row, the Microsoft model hits 71.6 on SWE-Bench Verified using about a third of the tokens Claude Haiku 4.5 burns. Benchmarks are now measured on two different dimensions, the overall performance & the cost to achieve that intelligence. This is yet another sign that the era of subsidies2, tokenmaxxing3, & all-out performance for many use cases is over. Eve...
Read the original article