GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents
arxiv.org·7h
Flag this post

View PDF HTML (experimental)

Abstract:With the software industry shifting toward a data-driven culture, online A/B testing is a key tool for evaluating new technologies. However, deploying such experiments requires substantial resources, may negatively impact users, and involves long data collection periods. To address this, \textit{off-policy evaluation (OPE)}, or offline A/B testing, uses logged data to assess technologies and is fundamental in Reinforcement Learning, making it crucial in domains where online testing is costly or risky, such as healthcare, recommender systems, education, dialog systems, and robotics. Despite advances in coding LLMs and agentic AI, little is known about leveraging them …

Similar Posts

Loading similar posts...