Statistical Reinforcement Learning and Decision Making
9.522 Fall 2023, MW 11-12:30
Location: 46-3002 Singleton Auditorium
Course notes
Lectures:
Introduction
- Lecture 01: Introduction
- Lecture 02: Statistical Learning. Online Supervised Learning
- Lecture 03: Online Supervised Learning
Multi-Armed Bandits
- Lecture 04: Multi-Armed Bandits
- Lecture 05: Optimism in the Face of Uncertainty: Upper Confidence Bound (UCB) algorithm
- Lecture 06: Posterior Sampling Methods
Contextual Bandits
- Lecture 07: Optimism with a Finite Class
- Lecture 08: Linear Models and LinUCB. Failure of Optimism.
- Lecture 09: eps-Greedy. Inverse Gap Weighting
Structured Bandits
- Lecture 10: Optimism for Structured Bandits. Eluder Dimension.
- Lecture 11: Decision-Estimation Coefficient. The E2D Meta-Algorithm
- Lecture 12: Examples. Inverse Gap Weighting. Optimal G-design.
- Lecture 13: Examples. Connections to Optimism and Posterior Sampling.
Intro to RL
- Lecture 14: Finite-Horizon Episodic MDPs
- Lecture 15: Bellman Optimality. Performance-Difference Lemma. Optimism.
- Lecture 16: Optimism and UCB-VI
General Decision Making
- Lecture 17: Decision-Estimation Coefficient.
- Lecture 18: E2D Algorithm. Online Oracles for Hellinger and KL.
- Lecture 19: Lower Bound and Examples
- Lecture 20: Proof of the Lower Bound
Reinforcement Learning
- Lecture 21: Tabular RL: DEC and the PC-IGW Algorithm
- Lecture 22: PC-IGW Analysis
- Lecture 23: Function Approximation. Realizability. Linear Q* and Linear MDPs
- Lecture 24: LSVI-UCB Algorithm and Analysis
- Lecture 25: Bellman Rank. Examples. BiLinUCB Algorithm
- Lecture 26: BiLinUCB Analysis
- Lecture 27: Conclusions / Open Directions