June 10, 2025
Summer25 Reading Group: Reinforcement Learning (RL) Theory
Instructor: Aadirupa Saha
Sessions
Date | Presenter | Topics | Resource | Notes |
---|---|---|---|---|
2025-06-13 | Zhengyao | MDP Basics, Values, Policies, Bellman Consistency Equation | RLM (Chap 1.1.1-1.1.3), AK (Lec 5) | - |
2025-06-17 | Zhengyao | Bellman Optimality Equations, Value Iteration, Policy Iteration, Convergence Results | RLM (Thm 1.7, 1.8; Chap 1.3.1-1.3.3), AK (Lec 5) | - |
2025-06-20 | Aniket | Policy Iteration, Convergence Guarantee, Episodic, Generative and Offline RL setting The performance difference lemma | RLM (Thm 1.14, Lem 1.16; Chap 1.3.2, 1.4, 1.5) AK (Lec 6) | - |
2025-06-24 | Amir | Example of Policy Classes, Policy Gradient methods, Non-convexity and Convergence of Value functions under Softmax Parameterizations | RLM (Lem 11.4, 11.5, 11.6; Chap 11.1, 11.2) AK (Lec 6) | - |
2025-06-27 | Amir | Natural Policy Gradient (NPG), NPG update with Softmax Parameterization and Fisher information | RLM (Lem 12.6; Chap 12.3) AK (Lem 4; Lec 6) | - |
2025-07-04 | Amir | Convergence of vanilla PG, Convergence of NPG | RLM (Thm 12.3, Cor 12.5, Thm 12.7; Chap 12.3, 12.4) AK (Lem 1, Thm 2, Thm 3, Lem 4; Lec 7) | - |
2025-07-08 | Amith | Exploration of Tabular MDPs, UCB-VI Algorithm, Regret Analysis | RLM (Alg 5, Thm 7.1; Chap 7.1, 7.2, 7.3) AK (Sec 2, Sec 3, Thm 5; Lec 8) | - |
2025-07-11 | Amith | UCB-VI Algorithm, Regret Analysis (contd), Improved bound for UCB-VI | RLM (Thm 7.1, 7.6; Chap 7.3, 7.4) AK (Thm 5, Sec 3.1; Lec 8) | - |
2025-07-15 | Amith | Improved bound for UCB-VI (contd), Linear Bandits (LinUCB) and Regret Analysis | RLM (Thm 7.6; Chap 7.4. Alg 4, Thm 6.3, Prop 6.6; Chap 6) AK (Sec 3.1; Lec 8. Sec 3, Thm 2, Lem 3; Lec 9) | - |
2025-07-18 | Ali | Linear MDP problem, Planning in Linear MDPs, LSVI Algorithm | RLM (Sec 8.1, 8.2; Chap 8. Alg 1, Chap 3.2) AK (Def 1, Prop. 2, Sec 1; Lec 9) | - |
2025-07-22 | Ali | LSVI Algorithm (contd), Linear Bellman Completeness Analysis of LSVI | RLM (Def 3.1, Alg 1, Thm 3.3; Chap 3.1-3.3) AK (Sec 2, Thm 3; Lec 9) | - |
2025-07-25 | --- | Off-Policy Evaluation in Offline RL, Weaker assumptions and linear Bellman completeness | RLM (Alg 2, Thm 3.7, 3.9; Chap 3.5) AK (Assump 4, Lec 5; Sec 3; Lec 9) | - |
2025-07-29 | --- | Learning Transition using Ridge Linear Regression, Uniform Convergence and learning transition dynamics in Linear-MDPs | RLM (Lem 8.3, Lem 8.4, Lem 8.6, 8.7; Chap 8.3, 8.4) | - |
2025-08-01 | -- | UCBVI for Linear MDPs, Analysis of UCBVI for Linear MDPs | RLM (Alg 6, Thm 8.8, Thm 8.9; Chap 8.5, 8.6) | - |
Course Description
This summer reading group explores foundational and advanced topics in Reinforcement Learning theory, following closely the RL Theory Monograph by Agarwal, Jiang, Kakade, and Sun. Participants will take turns presenting key concepts weekly, with occasional discussions drawing from classic texts Reinforcement Learning: An Introduction by Sutton and Barto. The group aims to build theoretical intuition while fostering informal collaboration around RL and broader ML theory.
Timing: Tuesday-Friday, 5:30-7 PM Central
Core References
- SB: Reinforcement Learning: An Introduction by Sutton & Barto
- RLM: RL: Theory & Algorithms by Agarwal, Jiang, Kakade, Sun
- FRL: Foundations of Reinforcement Learning and Interactive Decision Making by Foster & Rakhlin
- MFRL: Mathematical Foundation of Reinforcement Learning by Shiyu Zhao
- TFRL: Theoretical Foundations of Reinforcement Learning by Csaba Szepesvári
- AK: COMS6998-11: Bandits and Reinforcement Learning, by Akshay Krishnamurthy
- NJ: CS 542: Statistical Reinforcement Learning, by Nan Jiang