Blog

The open-source LLM eval frameworks I actually compared, and the question that sorts them (opens in new tab)

“Eval framework” covers app-output graders, RAG-specific scorers, and academic benchmark harnesses. They are not substitutes. Pick by what…