LLM Evaluation: The New Bottleneck in AI (opens in new tab)

Discussed on Substack

Language models are improving faster than we can reliably measure them — and that’s becoming a problem.