📈 Optimization - jhcha.oyo · Scour

Multilevel Stochastic Gradient Descent for Risk-Averse PDE-Constrained Optimization

🎲Probability Academic

From SGD to Muon: An Incremental Tutorial (Fable-5)

🧠Neural Networks Blog

sankalp.bearblog.dev·

Adaptive Learning Rates with Surrogate Probability for Follow-the-Perturbed-Leader

🎯RLHF Academic

A Theory on Flow Matching with Neural Networks

🤖AI Academic

Forward-Only Convolutional Neural Networks with Learnable Channel-Class Assignment

🧠Deep Learning Academic

An Ensembled Latent Factor Model via Differential Evolution and Gradient Descent Optimization

🤖Machine Learning Academic

Uniform Stability and Generalization Error of GD and SGD on Fixed-Point Parameters

🤖Machine Learning Academic

Optimal Rates for Generalization of Gradient Descent Methods with Deep Neural Networks

🧠Deep Learning Academic

Pseudospectral Bounds for Transient Amplification in Coupled Gradient Descent

🤖Machine Learning Academic

Flatland: The Adventures of Gradient Descent with Large Step Sizes

🧠Deep Learning Academic

Second-Order Path Kernel Interpolation Formulas in Machine Learning

🤖Machine Learning Academic

Revisiting Privacy Amplification by Subsampling in Selective Release DPSGD

🤖Machine Learning Academic

Predictive Coding with Bayesian Priors via Proximal Gradients

🎲Probability Academic

When Both Layers Learn: Training Dynamics of Representing Linear Models via ReLU Networks

🧠Neural Networks Academic

Projected Inverse Iteration: An Eigenvalue Approach to Ground-State Computation with Neural Quantum States

🧠Deep Learning Academic

Noise-Adaptive High-Probability Regret Bounds for Online Convex Optimization

🎲Stochastic Processes Academic

Gradient descent at the Edge of Stability: free energy model and kinetic description of the two-layer network

🧠Neural Networks Academic

Characterizing Learning Dynamics under Relative Reparameterization of Singular Models

🤖AI Academic

Fourier fractal dimension to predict the generalization of deep neural networks

🤖AI Academic

When Do Fewer Coordinates Suffice in DP-SGD?

🤖Machine Learning Academic

Log in to enable infinite scrolling