Title:Infrequent Exploration in Linear Bandits
Abstract:We study the problem of infrequent exploration in linear bandits, addressing a significant yet overlooked gap between fully adaptive exploratory methods (e.g., UCB and Thompson Sampling), which explore potentially at every time step, and purely greedy approaches, which require stringent diversity assumptions to succeed. Continuous exploration can be impractical or unethical in safety-critical or costly domains, while purely greedy strategies typically fail without adequate contextual diversity. To bridge these extremes, we introduce a simple and practical framework, INFEX, explicitly designed for infrequent exploratio…
Title:Infrequent Exploration in Linear Bandits
Abstract:We study the problem of infrequent exploration in linear bandits, addressing a significant yet overlooked gap between fully adaptive exploratory methods (e.g., UCB and Thompson Sampling), which explore potentially at every time step, and purely greedy approaches, which require stringent diversity assumptions to succeed. Continuous exploration can be impractical or unethical in safety-critical or costly domains, while purely greedy strategies typically fail without adequate contextual diversity. To bridge these extremes, we introduce a simple and practical framework, INFEX, explicitly designed for infrequent exploration. INFEX executes a base exploratory policy according to a given schedule while predominantly choosing greedy actions in between. Despite its simplicity, our theoretical analysis demonstrates that INFEX achieves instance-dependent regret matching standard provably efficient algorithms, provided the exploration frequency exceeds a logarithmic threshold. Additionally, INFEX is a general, modular framework that allows seamless integration of any fully adaptive exploration method, enabling wide applicability and ease of adoption. By restricting intensive exploratory computations to infrequent intervals, our approach can also enhance computational efficiency. Empirical evaluations confirm our theoretical findings, showing state-of-the-art regret performance and runtime improvements over existing methods.
| Comments: | NeurIPS 2025 camera-ready version |
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2510.26000 [cs.LG] |
| (or arXiv:2510.26000v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2510.26000 arXiv-issued DOI via DataCite |
Submission history
From: Harin Lee [view email] [v1] Wed, 29 Oct 2025 22:25:43 UTC (72 KB)