KiloBench - Because Your Benchmark Score Doesn't Pay the Bill (opens in new tab)
Last month I was reviewing our model evaluation results for a new release, and I caught myself doing something absurd: comparing two models on a benchmark that neither of them would ever encounter in our actual product.
Read the original article