June 10, 2025
Reading Group: Reinforcement Learning Theory (Summer, 2025)
📡 •Instructor: Aadirupa Saha
Sessions
Date | Presenter | Topics | Resource | Notes |
---|---|---|---|---|
2025-06-13 | Zhengyao | MDP Basics, Values, Policies, Bellman Consistency Equation | RLM (Chap 1.1.1-1.1.3), AK (Lec 5) | - |
2025-06-17 | Zhengyao | Bellman Optimality Equations, Value Iteration, Policy Iteration, Convergence Results | RLM (Thm 1.7, 1.8; Chap 1.3.1-1.3.3), AK (Lec 5) | - |
2025-06-20 | Aniket | Policy Iteration, Convergence Guarantee, Episodic, Generative and Offline RL setting The performance difference lemma | RLM (Thm 1.14, Lem 1.16; Chap 1.3.2, 1.4, 1.5) AK (Lec 6) | - |
2025-06-24 | Amir | Example of Policy Classes, Policy Gradient methods, Non-convexity and Convergence of Value functions under Softmax Parameterizations | RLM (Lem 11.4, 11.5, 11.6; Chap 11.1, 11.2) AK (Lec 6) | - |
2025-06-27 | Amir | Natural Policy Gradient (NPG), NPG update with Softmax Parameterization and Fisher information | RLM (Lem 12.6; Chap 12.3) AK (Lem 4; Lec 6) | - |
2025-07-04 | Amir | Convergence of vanilla PG, Convergence of NPG | RLM (Thm 12.3, Cor 12.5, Thm 12.7; Chap 12.3, 12.4) AK (Lem 1, Thm 2, Thm 3, Lem 4; Lec 7) | - |
2025-07-08 | Amith | Exploration of Tabular MDPs, UCB-VI Algorithm, Regret Analysis | RLM (Alg 5, Thm 7.1; Chap 7.1, 7.2, 7.3) AK (Sec 2, Sec 3, Thm 5; Lec 8) | - |
2025-07-11 | Amith | UCB-VI Algorithm, Regret Analysis (contd) | RLM (Thm 7.1, 7.6; Chap 7.3, 7.4) AK (Thm 5, Sec 3.1; Lec 8) | - |
2025-07-15 | Amith | Improved bound for UCB-VI, Intro to Linear Bandits | RLM (Thm 7.6; Chap 7.4. Alg 4; Chap 6.2) AK (Sec 3.1; Lec 8. Sec 3; Lec 3) | - |
2025-07-18 | Amith | LinUCB Algorithm Regret Analysis of LinUCB | RLM (Alg 4, Thm 6.3, Prop 6.6; Chap 6.2, 6.3) AK (Sec 3, Thm 2, Lem 3; Lec 3) | - |
2025-07-22 | Ali | Linear Bellman Completeness, D-Optimal Design LSVI Algorithm (Value Iteration for Linear Bellman Complete MDP) | RLM (Defn 3.1, Alg 1, Thm 3.2; Chap 3.1-3.3) AK (Def 1, Prop. 2, Sec 1; Lec 9) | - |
2025-07-25 | Ali / AS | Convergence Analysis of LSVI Algorithm Interpretation of G-Optimal and D-Optimal Design, Kiefer–Wolfowitz Theorem | RLM (Thm 3.3, Lem 3.4; Chap 3.3.2, 3.3.3) BALG (Thm 21.1; Chap 21.1) | - |
2025-07-29 | Ali | Convergence Analysis of LSVI Algorithm (contd) Least Squares Policy Evaluation (LSPE) and Analysis | RLM (Lem 3.5, Thm 3.3; Chap 3.3) RLM (Defn 3.8, Alg 2, Thm 3.9; Chap 3.5) | - |
2025-08-01 | AS | Low-Rank MDPs and Linear MDPs, Planning in Linear MDPs Learning Transition using Ridge Linear Regression | RLM (Claim 8.2, Lem 8.3, 8.4; Chap 8.1, 8.2, 8.3) AK (Defn 1, Prop 2, Sec 1; Lec 9) | - |
2025-08-05 | Ali | Covering Number, Uniform Convergence via Covering Uniform Convergence to Estimate Transition Dynamics | RLM (Lem 8.5, 8.6, 8.7; Chap 8.4) --- | - |
2025-08-08 | Aniket | LSVI-UCB: UCBVI for Linear MDPs, Analysis of LSVI-UCB | RLM (Lem 8.8, Thm 8.9; Chap 8.5, 8.6) AK (Thm 3, Sec 2 2; Lec 9) | - |
Course Description
This summer reading group explores foundational and advanced topics in Reinforcement Learning theory, following closely the RL Theory Monograph by Agarwal, Jiang, Kakade, and Sun. Participants will take turns presenting key concepts weekly, with occasional discussions drawing from classic texts Reinforcement Learning: An Introduction by Sutton and Barto. The group aims to build theoretical intuition while fostering informal collaboration around RL and broader ML theory.
Timing: Tuesday-Friday, 5:30-7 PM Central
Core References
- SB: Reinforcement Learning: An Introduction by Sutton & Barto
- BALG: Bandit Algorithms by Szepesvari & Lattimore
- RLM: RL: Theory & Algorithms by Agarwal, Jiang, Kakade, Sun
- FRL: Foundations of Reinforcement Learning and Interactive Decision Making by Foster & Rakhlin
- MFRL: Mathematical Foundation of Reinforcement Learning by Shiyu Zhao
- TFRL: Theoretical Foundations of Reinforcement Learning by Csaba Szepesvári
- AK: COMS6998-11: Bandits and Reinforcement Learning, by Akshay Krishnamurthy
- NJ: CS 542: Statistical Reinforcement Learning, by Nan Jiang