Optimization
Gram Newton-Schulz: A Fast, Hardware-Aware Newton-Schulz Algorithm for Muon
 ⚡CUDA  Content type: BlogOptimal Rates for Generalization of Gradient Descent Methods with Deep Neural Networks
 🔬Deep Learning  Content type: AcademicCapacity-Constrained Online Convex Optimization with Delayed Feedback
 🎮Reinforcement Learning  Content type: AcademicLess-relevant results