Date Topic Lecture Notes Resources Sep 6 Logistics, vignette: diffraction limit and learning theory instructor notes, slides
- Airy disks and mixture models: [Chen-Moitra β20] (lecture covered Sections 4.1 and 4.2)
- Learning mixtures of exponentials: [Moitra β15, extremal functions and Vandermonde matrices], see also [Liao-Fannjiang β14, "MUSIC" algorithm] and [Candes-Fernandez-Granda β12, theory of super-resolution]
Sep 11 Tensors I: Jennrichβs algorithm, applications (super-resolution, Gaussian mixtures, indepenβ¦
Date Topic Lecture Notes Resources Sep 6 Logistics, vignette: diffraction limit and learning theory instructor notes, slides
- Airy disks and mixture models: [Chen-Moitra β20] (lecture covered Sections 4.1 and 4.2)
- Learning mixtures of exponentials: [Moitra β15, extremal functions and Vandermonde matrices], see also [Liao-Fannjiang β14, "MUSIC" algorithm] and [Candes-Fernandez-Granda β12, theory of super-resolution]
Sep 11 Tensors I: Jennrichβs algorithm, applications (super-resolution, Gaussian mixtures, independent component analysis) instructor notes, slides, scribe notes
- Book chapter on tensor decomposition: [Vijayaraghavan β20]
- Applications: learning mixtures of spherical Gaussians with linearly independent means: [Hsu-Kakade β13], mixtures of exponentials via tensor methods: [Huang-Kakade β15β], independent component analysis: [Comon β94], tensor methods for latent variable models: [Ananadkumar et al. β12]
- History of Jennrichβs algorithm: [blog post]
- Karl Pearson, crabs, and method of moments: [blog post]
Sep 13 Tensors II: Iterative methods (gradient descent, tensor power method, alternating least squares) instructor notes, slides, scribe notes 1, scribe notes 2
- Tensor power method from random initialization, experiments comparing different methods: [Sharan-Valiant β17]
- Practicalities of tensor decomposition: [Janzamin et al. β19] (chapters 3 and 6)
- Additional readings: tensor power method with whitening: [Anandkumar et al. β12], gradient descent without deflation: [Ge et al. β15], surprising lower bounds for tensor power method: [Wu-Zhou β22], optimization landscape for random overcomplete tensors: [Ge-Ma β17], evidence for computational phase transition for overcomplete tensor decomposition: [Wein β23]
Sep 18 Tensors III: overcomplete tensor decomposition via smoothed analysis instructor notes, slides, scribe notes 1, scribe notes 2
- Original paper proposing smoothed analysis for tensor decomposition: [Bhaskara et al. β13]
- Simpler proof presented in class for condition number bound in smoothed bound: [Anari et al. β18], see also Section 4 of [Vijayaraghavan β20]
- Handles smoothing that is the same across modes via decoupling: [Bhaskara et al. β18]
- Smoothed analysis in algorithm design: simplex method [Spielman-Teng β01], condition number [Sankar-Spielman-Teng β03], CACM survey [Spielman-Teng β09]
Sep 20 Sum-of-squares I: pseudo-distributions, application to robust regression instructor notes, slides, scribe notes
- Sum-of-squares resources: graduate course 1: [Barak-Steurer], graduate course 2: [Schramm β21], graduate course 3: [Kunisky β22], graduate course 4 (youtube videos) [Kothari β20], SoS and estimation survey [Raghavendra-Schramm-Steurer β19]
- Robust regression via sum-of-squares: result covered in class: [Klivans-Kothari-Meka β18], better error guarantee: [Bakshi-Prasad β20], robust regression without distributional assumptions: [Chen-Koehler-Moitra-Yau β20], heavy-tailed regression: [Cherapanamjeri et al. β19]
Sep 25 Sum-of-squares II: mixtures of spherical Gaussians, certifiable hypercontractivity instructor notes, slides, scribe notes
- Older papers on learning mixtures of Gaussians: [Dasgupta β99], [Arora-Kannan β05], [Dasgupta-Schulman β07], [Vempala-Wang β04], [Moitra-Valiant β10]
- Information-theoretic threshold at separation sqrt log k: [Regev-Vijayaraghavan β17]
- Notes on clustering spherical Gaussian mixtures with SoS: [Hopkins β18]
- Original papers on clustering spherical Gaussian mixtures at the optimal threshold: [Hopkins-Li β17], [Kothari-Steinhardt-Steurer β17], see also [Diakonikolas-Kane-Stewart β17]
- Follow-up paper achieving optimal threshold in polynomial time: [Li-Liu β21]
Sep 27 Sum-of-squares III: high-entropy constraints, noisy orthogonal tensor decomposition instructor notes (board talk only), scribe notes
- This lecture (SoS and maximum entropy constraints for tensor decomposition): [Ma-Shi-Steurer β16], expository notes: [Hopkins β18]
- Earlier works on SoS for tensor decomposition: [Barak-Kelner-Steurer β15], [Ge-Ma β15]
Oct 2 Sum-of-squares IV: rounding SoS relaxations with Jennrichβs, overcomplete tensor decomposition instructor notes, slides, scribe notes
- Fast spectral algorithms for tensor decomposition inspired by SoS: overcomplete tensor decomposition: [Hopkins-Schramm-Shi-Steurer β15], [Ding-DβOrsi-Liu-Steurer-Tiegel], noisy orthogonal tensor decomposition: [Schramm-Steurer β17]
- Faster algorithm for clustering Gaussian mixtures at optimal separation: [Li-Liu β21]
Oct 4 Robust statistics I: motivation, iterative filtering instructor notes, slides, scribe notes
- Original computationally efficient robust statistics papers: [Diakonikolas-Kamath-Kane-Li-Moitra-Stewart β16], [Lai-Rao-Vempala β16]
- Graduate textbook on robust statistics: [Diakonikolas-Kane β22] (see chapters 1 - 3)
- Graduate course on robust statistics: [Li β19] (see lectures 4, 5)
- Subtleties between contamination models: better error bound under additive contamination: [Diakonikolas-Kamath-Kane-Li-Moitra-Stewart β17], log factor is necessary for strong contamination: [Diakonikolas-Kane-Stewart β16], separations between different models for robust mean testing: [Canonne-Hopkins-Li-Liu-Narayanan β23]
- Very fast algorithm for robust mean estimation using online learning: [Dong-Hopkins-Li β19]
- Application to defenses against poisoning neural nets: [Tran-Li-Madry β18]
Oct 11 Robust statistics II: list-decodable learning, subspace isotropic filtering instructor notes (board talk only), supplemental instructor notes [pdf], scribe notes
- Graduate textbook on robust statistics: [Diakonikolas-Kane β22] (see chapter 5)
- Subspace isotropic filtering paper (this lecture): [Diakonikolas-Kane-Kongsgaard-Li-Tian β20]
- Previous results on list-decodable mean estimation: paper introducing list-decodable learning [Balcan-Blum-Vempala β08], modern formulation of list-decodable learning: [Charikar-Steinhardt-Valiant β17], list-decodable mean estimation via multifilter: [Diakonikolas-Kane-Kongsgaard], first near-linear time algorithm: [Cherapanamjeri-Mohanty-Yau β20]
- List-decodable regression: [Karmalkar-Klivans-Kothari β19], [Raghavendra-Yau β19]
- List-decodable subspace recovery: [Raghavendra-Yau β20], [Bakshi-Kothari β16]
Oct 16 Robust statistics III: learning halfspaces under Massart noise instructor notes, slides, scribe notes
- Efficient distribution-free learning of halfspaces under Massart noise: [Diakonikolas-Gouleakis-Tzamos β19], [Chen-Koehler-Moitra-Yau β20] (this lecture), see also improved lower bound: [Diakonikolas-Kane β20, [Nasser-Tiegel β22], [Diakonikolas-Kane-Manurangsi-Ren]
- Lecture notes on matching loss and generalized linear models: [Kanade β22] (see Section 8.5)
- Hardness of agnostic learning of halfspaces: under Feigeβs random 3SAT hypothesis: [Daniely β15], under hardness of lattice problems: [Tiegel β22], see also: [Kalai-Klivans-Mansour-Servedio β06], [Klivans-Kothari β14], [Daniely-Vardi β21], [Diakonikolas-Kane-Pittas-Zarifis β21]
- Algorithm for learning halfspaces under RCN without margin assumption: [Blum-Frieze-Kannan-Vempala β97], [Cohen β97], [Dunagan-Vempala β04a], [Dunagan-Vempala β04b]
Oct 18 Supervised learning I: PAC basics, low-degree algorithm, L1/L2 polynomial regression, polynomial threshold functions slides, scribe notes
- Fourier analysis of Boolean functions ("the bible"): [OβDonnell β14] (see chapters 1 and 3)
- Foundational works on PAC learning: paper introducing the model: [Valiant β84], low-degree algorithm: [Linial-Mansour-Nisan β93], PAC learning and VC dimension: [Blumer et al. β89]
- L1 polynomial regression for agnostic learning: [Kalai-Klivans-Mansour-Servedio β06]
- PTFs for distribution-free learning of DNFs: [Klivans-Servedio β01]
- PAC learning beyond AC0: [Jackson-Klivans-Servedio β02]
Oct 23 Supervised learning II: noise sensitivity, one-hidden layer MLPs, Hermite analysis, CSQ algorithms instructor notes, slides, scribe notes
- Noise sensitivity and Fourier concentration: [Klivans-OβDonnell-Servedio β02]
- Polynomial regression over continuous inputs: [Goel-Kanade-Klivans-Thaler β16], [Diakonikolas-Goel-Karmalkar-Klivans-Soltanolkotabi β20], [Diakonikolas-Kane-Pittas-Zarifis β21] (matching lower bounds)
- Tensor decomposition for learning one-hidden-layer MLPs: [Janzamin-Sedghi-Anandkumar β15], smoothed analysis setting: [Awasthi-Tang-Vijayaraghavan β21]
- Learning one-hidden-layer MLPs with poly(1/eps) dependence: [Chen-Dou-Goel-Klivans-Meka β23], [Chen-Narayanan β23], [Diakonikolas-Kane β23]
- CSQ lower bounds: [Goel-Gollakota-Jin-Karmalkar-Klivans β20], [Diakonikolas-Kane-Kontonis-Zarifis β20]
Oct 25 Supervised learning III: filtered PCA instructor notes, slides, scribe notes
- Filtered PCA: [Chen-Klivans-Meka β20], [Chen-Meka β20]
Oct 30 Supervised learning IV: linearized networks instructor notes, slides, scribe notes
- Lecture notes on linearized networks: [Misiakiewicz-Montanari](chapters 1-5, extensive literature review in Appendix B)
- Lazy training is consequence of choice of scaling: [Chizat-Oyallon-Bach β18], [Telgarsky β21] (blackboard part of this lecture)
- Convergence for training deep networks in NTK regime: original NTK paper: [Jacot-Gabriel-Hongler β18], convergence for deep networks: [Du et al. β18] and [Allen-Zhu-Li-Song β18] (see P. 3 for comparison), [Zou et al. β18] (classification instead of regression)
- Excellent slides on kernel methods: [Mairal-Vert β17]
- Generalization for random features and NTK: [Ghorbani-Mei-Misiakiewicz-Montanari β19], [Mei-Misiakiewicz-Montanari β21], [Montanari-Zhong β20]
Nov 1 Supervised learning V: mean field limit instructor notes, slides, scribe notes
- Lecture notes on linearized networks: [Misiakiewicz-Montanari] (chapter 6, extensive literature review in Appendix B)
- Original mean-field neural net papers: [Mei-Montanari-Nguyen β18] (blackboard part of this lecture on propagation of chaos), [Chizat-Bach β18], [Rotskoff-Vanden-Eijnden β18], [Sirignano-Spiliopoulos β18]
- Improved non-asymptotic convergence: [Mei-Misiakiewicz-Montanari β19]
- Single index models and information exponent: [Ben Arous-Gheissari-Jagannath β20], leap complexity and mean-field for multi-index models: [Abbe-Boix-Adsera-Misiakiewicz β23], see also earlier works: [Abbe-Boix-Adsera-Misiakiewicz β22], [Abbe-Boix-Adsera-Brennan-Bresler-Nagaraj β21]
- Mean field for learning time-scales: [Berthier-Montanari-Zhou β23]
- Textbook on Wasserstein gradient flows: [Ambrosio-Gigli-Savare]
Nov 6 Computational complexity I: hardness basics, SQ lower bounds, noisy parity instructor notes, slides, scribe notes
- Original statistical query learning paper: [Kearns et al. β98], survey paper: [Reyzin β20], paper defining CSQ: [Bshouty-Feldman β02]
- Simple proof that SQ dimension suffices: [Szorenyi β09]
- Pairwise independence and statistical queries: [Bogdanov-Rosen β17] (see Section 7.7)
Nov 8 Computational complexity II: SQ dimension (application to MLPs), statistical dimension (applications to Gaussian mixtures, generative models) instructor notes, slides, scribe notes
- Moment matching recipe for SQ lower bounds for unsupervised problems: [Diakonikolas-Kane-Stewart β17], slides on this work: [Diakonikolas β17]
- Statistical dimension: [Feldman-Grigorescu-Reyzin-Vempala-Xiao β12], see also refinement by [Feldman β16]
- CSQ lower bounds for MLPs: [Diakonikolas et al. β20], [Goel et al. β20]
- "Cheating" SQ algorithm for learning real-valued functions: [Vempala-Wilmes β18] (see Section 4)
- SQ lower bound for learning simple generative models: [Chen-Li-Li β22]
Nov 13 Computational complexity III: cryptographic lower bounds, Daniely-Vardi lifting instructor notes, slides, scribe notes
- Reading list from a [reading group] on crypto and learning
- Cryptography resources: CS 127 [Barak], Goldreichβs textbook [Goldreich], MIT 6.875 [Vaikuntanathan β23]
- Public key crypto and hardness of learning: original paper showing connection: [Kearns-Valiant β94], AC0 and TC0 (distribution specific): [Kharitonov β93], intersections of halfspaces: [Klivans-Sherstov β06], juntas: [Applebaum-Barak-Wigderson β09]
- Pseudorandom functions survey: [Bogdanov-Rosen β17]
- Hardness of agnostic learning: halfspaces and noisy parity: [Kalai-Klivans-Mansour-Servedio β06], connections to CSP refutation: [Kothari-Livni β17],, [Vadhan β17], [Daniely β15], hardness from LWE: [Diakonikolas-Kane-Ren β23], [Tiegel β22]
- Local PRGs: slides from survey talk: [Ishai], cryptography in NC0: [Applebaum-Ishai-Kushilevitz β06], Goldreichβs construction: [Goldreich β01]
- Daniely-Vardi lifting: original paper for worst-case depth-3 MLPs: [Daniely-Vardi β21], extension to SQ for two-hidden-layer MLPs: [Chen-Gollakota-Klivans-Meka β22], extension to hardness in the smoothed analysis setting: [Daniely-Srebro-Vardi β23]
- Learning with errors: survey on classical LWE: [Regev β10], continuous LWE: [Bruna-Regev-Song-Tang β20], CLWE as hard as LWE: [Gupte-Vafa-Vaikuntanathan β22], CLWE hardness for single index models: [Song-Zadik-Bruna β21]
Nov 15 Statistical physics I: intro to graphical models and variational inference instructor notes (board talk only), scribe notes
- Textbook on statistical physics and inference: [Mezard-Montanari β09]
- Lecture notes on statistical physics / optimization / learning: [Krzakala-Zdeborova β22] (see chapters 1 and 4.1)
- Lecture notes on graphical models: [Montanari β11] (see chapter 1), [Ermon β22] (see first two sections)
- Textbook on graphical models / variational inference: [Wainwright-Jordan β08]
- St. Flour lectures on statistical mechanics and sparse random graphs: [Montanari β14]
Nov 20 Statistical physics II: belief propagation, Bethe free energy instructor notes, slides, supplemental instructor notes [pdf], scribe notes
- Textbook on statistical physics and inference: [Mezard-Montanari β09] (see chapter 14)- Lecture notes on statistical physics / optimization / learning: [Krzakala-Zdeborova β22] (see chapter 4.2)
- Lecture notes on graphical models: [Montanari β11] (see chapters 2 and 4.3), [Ermon β22] (see Inference section)
- Lecture notes on Bethe free energy / BP: [Macris β11], [Pfister β14] (see also the course page for an extensive reading list)
- Global convergence of BP for ferromagnetic Ising model: [Koehler β19]
Nov 27 Statistical physics III: derivation of AMP, state evolution, Z2 synchronization instructor notes, slides, scribe notes
- See the extensive survey on AMP by [Feng-Venkataramanan-Rush-Samworth β21] for pointers to the large literature on AMP
- Tutorials on AMP: survey on mean-field spin glass techniques: [Montanari-Sen β22] (see Section 2.4 for derivation of AMP and proof sketch for state evolution), lecture notes on high-dimensional inference [El Alaoui β21] (see Lectures 8 to 12 for details on state evolution), lecture notes on statistical physics / optimization / learning: [Krzakala-Zdeborova β22] (see chapter 8 for cavity method perspective on AMP), short lecture on AMP for Z2 synchronization [Wein β17], video recording of tutorial on statistical physics methods: [El Alaoui β22]
- Survey on low rank matrix estimation: [Miolane β19]
- First work rigorously proving state evolution for AMP (for solving TAP equation) [Bolthausen β12], state evolution result from class: [Bayati-Montanari β10]
Nov 29 Statistical physics IV: optimality of AMP for Z2 synchronization instructor notes, slides, scribe notes
- Proof in lecture of optimality comes from: [Deshpande-Abbe-Montanari β15]
- TAP free energy perspective: [Fan-Mei-Montanari β18], [Celentano-Fan-Mei β21], [Celentano β22], [Celentano-Fan-Lin-Mei β23], survey on TAP equation: [Bolthausen]
- Spectral algorithms from BP: "spectral redemption" using nonbacktracking operator: [Krzakala et al. β13], rigorous results on spectrum of nonbacktracking operator and related matrices: [Massoulie β13], [Mossel-Neeman-Sly β13], [Bordenave-Lelarge-Massoulie β15], [Abbe-Sandon β15], spectral algorithms for general "Bayesian constraint satisfaction problems" inspired by cavity method: [Liu-Mohanty-Raghavendra β21]
- Surveys on community detection: [Moore β17], [Abbe β18], optimal recovery for two communities: [Mossel-Neeman-Sly β13], [Yu-Polyanskiy β22]
Nov 29 Diffusion models instructor notes, slides, scribe notes