Statistical Foundations of LLM-based A/B Testing: A Surrogacy Framework for Human Causal Inference (opens in new tab)

Organizations and researchers show increasing interest in using large language models (LLMs) in place of human participants in A/B tests, in the hope of experimenting faster and at lower cost. We study when a treatment effect estimated on LLM outcomes recovers the effect that would have been measured on the human population of interest. Distributional equivalence between LLM and human outcomes would make any standard estimator valid but is unr...

Read the original article