Abstract:In this work, we study offline convex optimization with smooth objectives, where the classical Nesterov’s Accelerated Gradient (NAG) method achieves the optimal accelerated convergence. Extensive research has aimed to understand NAG from various perspectives, and a recent line of work approaches this from the viewpoint of online learning and online-to-batch conversion, emphasizing the role of optimistic online algorithms for acceleration. In this work, we contribute to this perspective by proposing novel optimistic online-to-batch conversions that incorporate optimism theoretically into the analysis, thereby significantly simplifying the online algorithm design while pr…
Abstract:In this work, we study offline convex optimization with smooth objectives, where the classical Nesterov’s Accelerated Gradient (NAG) method achieves the optimal accelerated convergence. Extensive research has aimed to understand NAG from various perspectives, and a recent line of work approaches this from the viewpoint of online learning and online-to-batch conversion, emphasizing the role of optimistic online algorithms for acceleration. In this work, we contribute to this perspective by proposing novel optimistic online-to-batch conversions that incorporate optimism theoretically into the analysis, thereby significantly simplifying the online algorithm design while preserving the optimal convergence rates. Specifically, we demonstrate the effectiveness of our conversions through the following results: (i) when combined with simple online gradient descent, our optimistic conversion achieves the optimal accelerated convergence; (ii) our conversion also applies to strongly convex objectives, and by leveraging both optimistic online-to-batch conversion and optimistic online algorithms, we achieve the optimal accelerated convergence rate for strongly convex and smooth objectives, for the first time through the lens of online-to-batch conversion; (iii) our optimistic conversion can achieve universality to smoothness – applicable to both smooth and non-smooth objectives without requiring knowledge of the smoothness coefficient – and remains efficient as non-universal methods by using only one gradient query in each iteration. Finally, we highlight the effectiveness of our optimistic online-to-batch conversions by a precise correspondence with NAG.
| Comments: | NeurIPS 2025 |
| Subjects: | Machine Learning (cs.LG); Optimization and Control (math.OC) |
| Cite as: | arXiv:2511.06597 [cs.LG] |
| (or arXiv:2511.06597v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2511.06597 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Peng Zhao [view email] [v1] Mon, 10 Nov 2025 01:07:51 UTC (472 KB)