Why most AI evals would miss the Linear sales email failure (opens in new tab)
Linear's sales agent emailed an existing customer six times with the wrong company name. An LLM judge would have scored that email highly. Here is why standard evaluation never would have caught it.
Read the original article