S&DS 605: Sampling and Optimal Transport (Fall 2024)

Course Description: MCMC sampling and variational inference have long been utilized in Bayesian statistics and machine learning; what can we say about the convergence of these methods? Recently, a modern theory has emerged which blends principles from convex optimization with a geometric perspective on the space of probability distributions based on optimal transport. This course provides an introduction to this theory, as well as to related tools used for modern algorithmic analysis: Markov semigroup theory and stochastic calculus, coupling, and functional inequalities. Much of the course focuses on the complexity of log-concave sampling, but we also discuss applications to diffusion models and variational inference.

Instructor: Sinho Chewi (sinho.chewi@yale.edu)

References

Schedule

The course meets on Tuesdays and Thursdays, 1–2.15 p.m., in Watson Center A48 (note the room change!). I will also be available for an hour after each class if you have any questions or if you would like to discuss. If these times do not work for you, or if you have any other concerns, please reach me via email.

  • Aug. 29 (Thursday): Introduction and motivation. Primer on stochastic calculus. Readings: [Che] Preface, §1.1.
  • Sept. 3 (Tuesday): Markov semigroup theory. Kolmogorov's equations and reversibility. Readings: [Che] §1.2.1.
  • Sept. 5 (Thursday): Markov semigroup theory (cont.): Carré du champ, integration by parts, and the Poincaré and log-Sobolev inequalities. Readings: [Che] §1.2.2~1.2.3.
  • Sept. 10 (Tuesday): Optimal transport, duality, and convex analysis preliminaries. Readings: [Che] §1.3.1.
  • Sept. 12 (Thursday): Proof of the fundamental theorem of optimal transport, metric, and topology. Readings: [Che] §1.3.1. Optional: for a more in-depth discussion of duality, see [CNR] §1.
  • Sept. 17 (Tuesday): Geometry of the Wasserstein space. Continuity equation, and the Benamou–Brenier formula. Readings: [Che] §1.3.2 or [CNR] §5.1~5.3. See also the notes/video for Lecture 1 in my KAIST summer school lectures.
  • Sept. 19 (Thursday): Otto calculus. Readings: [Che] §1.4~1.5. See also [CNR] §5.4 and the notes/video for Lecture 2 in my KAIST summer school lectures.
  • Sept. 24 (Tuesday): Otto calculus. Readings: [Che] §1.4~1.5. See also [CNR] §5.4 and the notes/video for Lecture 2 in my KAIST summer school lectures.
  • Sept. 26 (Thursday): Poincaré and log-Sobolev inequalities via commutation and curvature. Readings: [Che] §2.1~2.2.4, 2.2.6.
  • Oct. 1 (Tuesday): Operations preserving functional inequalities. Readings: [Che] §2.3, 2.4.1.
  • Oct. 3 (Thursday): Analysis of Langevin Monte Carlo via coupling. Readings: [Che] §4.1.
  • Oct. 8 (Tuesday): Analysis of Langevin Monte Carlo via interpolation. Readings: [Che] §4.2.
  • Oct. 10 (Thursday): Analysis of Langevin Monte Carlo via convex optimization. Readings: [Che] §4.3.
  • Oct. 15 (Tuesday): Analysis of Langevin Monte Carlo via Girsanov's theorem. Randomized midpoint discretization. Readings: [Che] §4.4, 5.1.
  • Oct. 17 (Thursday): October recess; no class.
  • Oct. 22 (Tuesday): Hamiltonian Monte Carlo and the underdamped Langevin diffusion. Readings: [Che] §5.2~5.3.
  • Oct. 24 (Thursday): Underdamped Langevin Monte Carlo. Readings: [Che] §5.3.
  • Oct. 29 (Tuesday): High-accuracy samplers. Readings: [Che] §7.1~7.3.
  • Oct. 31 (Thursday): Isoperimetry and conductance. Readings: [Che] §2.4, 7.4.
  • Nov. 5 (Tuesday): Analysis of MALA. Readings: [Che] §7.5~7.6.
  • Nov. 7 (Thursday): Proximal sampler. Readings: [Che] §8.1~8.5.
  • Nov. 12 (Tuesday): Proximal sampler. Readings: [Che] §8.5~8.6.
  • Nov. 14 (Thursday): Lower bounds. Readings: [Che] §9.1~9.2.
  • Nov. 19 (Tuesday): Mirror Langevin. Readings: [Che] §10.2.
  • Nov. 21 (Thursday): Non-log-concave sampling. Readings: [Che] §11 and https://arxiv.org/abs/2210.02482.

Assignments

Grades are based solely on four or five problem sets (to be determined), consisting of exercises from the textbook. Tentatively, the problem sets will be released on Sept. 12, Sept. 26, Oct. 24, Nov. 7, and Nov. 21, and each one is due roughly one and a half weeks after it is assigned.

Please feel free to work on the problem sets with others, but list your collaborators at the beginning of your submission. Problem sets should be typed in LaTeX and submitted via Gradescope (the link can be found in Canvas).

  • Problem Set 1 (due Sept. 24). Complete the following exercises from [Che]: 1.3 (Ornstein–Uhlenbeck process), 1.6 (functional inequalities and exponential decay), 1.7 (log-Sobolev implies Poincaré), 1.8 (mixing of the Ornstein–Uhlenbeck process), 1.9 (optimal transport between Gaussians), 1.10 (optimal transport with other costs).
  • Problem Set 2 (due Oct. 8). Complete the following exercises from [Che]: 1.14 (Wasserstein calculus for f-divergences), 1.15 (Otto–Villani theorem), 2.2 (curvature-dimension condition), 2.4 (dimensional improvement of the Brascamp–Lieb inequality), 2.6 (local Poincaré inequality), 2.8 (hypercontractivity).
  • Problem Set 3 (due Nov. 5). Complete the following exercises from [Che]: 2.10 (variational principle for entropies), 2.11 (transport inequality in one dimension), 4.1 (explicit computations for a Gaussian target), 4.3 (LMC with decaying step size), 5.1 (LMC under higher-order smoothness), 5.5 (Nesterov's algorithm in continuous time).
  • Problem Set 4 (due Nov. 19). Complete the following exercises from [Che]: 5.6 (derivation of the ULMC updates), 5.7 (movement bound for the underdamped Langevin diffusion), 7.3 (reversible Markov chains as gradient descent on the Dirichlet energy), 7.7 (boosting from TV to χ^2), 8.4 (first coupling lemma).
  • Problem Set 5 (due Dec. 6). Complete the following exercises from [Che]: 8.7 (Gaussian case), 9.1 (basics of information theory), 10.2 (Markov semigroup theory for the mirror Langevin diffusion), 10.5 (properties of the Bregman divergence).