Summer25 RL Reading Group

June 10, 2025

Reading Group: Reinforcement Learning Theory (Summer, 2025)

📡 •Instructor: Aadirupa Saha

Sessions

Date	Presenter	Topics	Resource
2025-06-13	Zhengyao	MDP Basics, Values, Policies, Bellman Consistency Equation	RLM (Chap 1.1.1-1.1.3), AK (Lec 5)
2025-06-17	Zhengyao	Bellman Optimality Equations, Value Iteration, Policy Iteration, Convergence Results	RLM (Thm 1.7, 1.8; Chap 1.3.1-1.3.3), AK (Lec 5)
2025-06-20	Aniket	Policy Iteration, Convergence Guarantee, Episodic, Generative and Offline RL setting The performance difference lemma	RLM (Thm 1.14, Lem 1.16; Chap 1.3.2, 1.4, 1.5) AK (Lec 6)
2025-06-24	Amir	Example of Policy Classes, Policy Gradient methods, Non-convexity and Convergence of Value functions under Softmax Parameterizations	RLM (Lem 11.4, 11.5, 11.6; Chap 11.1, 11.2) AK (Lec 6)
2025-06-27	Amir	Natural Policy Gradient (NPG), NPG update with Softmax Parameterization and Fisher information	RLM (Lem 12.6; Chap 12.3) AK (Lem 4; Lec 6)
2025-07-04	Amir	Convergence of vanilla PG, Convergence of NPG	RLM (Thm 12.3, Cor 12.5, Thm 12.7; Chap 12.3, 12.4) AK (Lem 1, Thm 2, Thm 3, Lem 4; Lec 7)
2025-07-08	Amith	Exploration of Tabular MDPs, UCB-VI Algorithm, Regret Analysis	RLM (Alg 5, Thm 7.1; Chap 7.1, 7.2, 7.3) AK (Sec 2, Sec 3, Thm 5; Lec 8)
2025-07-11	Amith	UCB-VI Algorithm, Regret Analysis (contd)	RLM (Thm 7.1, 7.6; Chap 7.3, 7.4) AK (Thm 5, Sec 3.1; Lec 8)
2025-07-15	Amith	Improved bound for UCB-VI, Intro to Linear Bandits	RLM (Thm 7.6; Chap 7.4. Alg 4; Chap 6.2) AK (Sec 3.1; Lec 8. Sec 3; Lec 3)
2025-07-18	Amith	LinUCB Algorithm Regret Analysis of LinUCB	RLM (Alg 4, Thm 6.3, Prop 6.6; Chap 6.2, 6.3) AK (Sec 3, Thm 2, Lem 3; Lec 3)
2025-07-22	Ali	Linear Bellman Completeness, D-Optimal Design LSVI Algorithm (Value Iteration for Linear Bellman Complete MDP)	RLM (Defn 3.1, Alg 1, Thm 3.2; Chap 3.1-3.3) AK (Def 1, Prop. 2, Sec 1; Lec 9)
2025-07-25	Ali / AS	Convergence Analysis of LSVI Algorithm Interpretation of G-Optimal and D-Optimal Design, Kiefer–Wolfowitz Theorem	RLM (Thm 3.3, Lem 3.4; Chap 3.3.2, 3.3.3) BALG (Thm 21.1; Chap 21.1)
2025-07-29	Ali	Convergence Analysis of LSVI Algorithm (contd) Least Squares Policy Evaluation (LSPE) and Analysis	RLM (Lem 3.5, Thm 3.3; Chap 3.3) RLM (Defn 3.8, Alg 2, Thm 3.9; Chap 3.5)
2025-08-01	AS	Low-Rank MDPs and Linear MDPs, Planning in Linear MDPs Learning Transition using Ridge Linear Regression	RLM (Claim 8.2, Lem 8.3, 8.4; Chap 8.1, 8.2, 8.3) AK (Defn 1, Prop 2, Sec 1; Lec 9)
2025-08-05	Ali	Covering Number, Uniform Convergence via Covering Uniform Convergence to Estimate Transition Dynamics	RLM (Lem 8.5, 8.6, 8.7; Chap 8.4) ---
2025-08-08	Aniket	LSVI-UCB: UCBVI for Linear MDPs, Analysis of LSVI-UCB	RLM (Lem 8.8, Thm 8.9; Chap 8.5, 8.6) AK (Thm 3, Sec 2; Lec 9)
2025-08-12	Aniket	LSVI-UCB Analysis (contd), Hypothesis Class of Bounded Q and V Bellman Rank	RLM (Lem 8.10, 8.11, 8.12; Thm 8.9; Chap 8.6) RLM (Defn 9.1, Sec 9.2; Chap 9.1, 9.2)
2025-08-15	Aniket	Example of MDPs with `small' Q, V Bellman Rank Understanding OLIVE under low-Bellman Rank	RLM (Prop 9.3-9.8; Chap 9.3) OLIVE paper, Bellman Rank notes, Nan Jiang
2025-08-19	Aniket	Bilinear Class of MDPs and Examples BLin-UCB: PAC-RL with Bounded Bilinear Rank and Sample Complexity Analysis	RLM (Def 9.10, Prop 9.11; Chap 9.4) RLM (Alg 7, Thm 9.16, Cor 9.17; Chap 9.5)
2025-08-22	Amir	Compatible function approximation, NPG, Q-NPG, and Examples. NPG Regret Lemma	RLM (Lem 13.1, 13.2; Chap 13.1, 13.2) RLG (Lem 13.3, Chap 13.3)
2025-09-05	Amir	Analysis of NPG Regret Lemma (contd), Relative condition number, Sample Complexity Analysis of Q-NPG for Log-Linear Policies	RLM (Lem 13.3, Chap 13.3) RLM (Assump 13.5, Thm 13.6; Chap 13.4)

Course Description

This summer reading group explores foundational and advanced topics in Reinforcement Learning theory, following closely the RL Theory Monograph by Agarwal, Jiang, Kakade, and Sun. Participants will take turns presenting key concepts weekly, with occasional discussions drawing from classic texts Reinforcement Learning: An Introduction by Sutton and Barto. The group aims to build theoretical intuition while fostering informal collaboration around RL and broader ML theory.

Timing: Tuesday-Friday, 5:30-7 PM Central

Core References

SB: Reinforcement Learning: An Introduction by Sutton & Barto
BALG: Bandit Algorithms by Szepesvari & Lattimore
RLM: RL: Theory & Algorithms by Agarwal, Jiang, Kakade, Sun
FRL: Foundations of Reinforcement Learning and Interactive Decision Making by Foster & Rakhlin
MFRL: Mathematical Foundation of Reinforcement Learning by Shiyu Zhao
TFRL: Theoretical Foundations of Reinforcement Learning by Csaba Szepesvári
AK: COMS6998-11: Bandits and Reinforcement Learning, by Akshay Krishnamurthy
NJ: CS 542: Statistical Reinforcement Learning, by Nan Jiang

rl, ml, theory, course