AI benchmarks are broken. Here’s what we need instead. (opens in new tab)
One-off tests don’t measure AI’s true impact. We’re better off shifting to more human-centered, context-specific methods.
Read the original articleOne-off tests don’t measure AI’s true impact. We’re better off shifting to more human-centered, context-specific methods.
Read the original article