Building Blocks of GenAI Product Evaluation (opens in new tab)
The offline stack - rubric, guideline, judge, annotator, benchmark, all resting on a well-built eval set - and the online A/B tests that keep it honest, illustrated with image generation.
Read the original article