June 10, 2025

Reading Group: Reinforcement Learning Theory (Summer, 2025)

📡 •Instructor: Aadirupa Saha

Sessions

DatePresenterTopicsResourceNotes
2025-06-13ZhengyaoMDP Basics, Values, Policies,
Bellman Consistency Equation
RLM (Chap 1.1.1-1.1.3),
AK (Lec 5)
-
2025-06-17ZhengyaoBellman Optimality Equations, Value Iteration,
Policy Iteration, Convergence Results
RLM (Thm 1.7, 1.8; Chap 1.3.1-1.3.3),
AK (Lec 5)
-
2025-06-20Aniket Policy Iteration, Convergence Guarantee, Episodic, Generative and Offline RL setting
The performance difference lemma
RLM (Thm 1.14, Lem 1.16; Chap 1.3.2, 1.4, 1.5)
AK (Lec 6)
-
2025-06-24 Amir Example of Policy Classes, Policy Gradient methods,
Non-convexity and Convergence of Value functions under Softmax Parameterizations
RLM (Lem 11.4, 11.5, 11.6; Chap 11.1, 11.2)
AK (Lec 6)
-
2025-06-27 Amir Natural Policy Gradient (NPG),
NPG update with Softmax Parameterization and Fisher information
RLM (Lem 12.6; Chap 12.3)
AK (Lem 4; Lec 6)
-
2025-07-04 Amir Convergence of vanilla PG,
Convergence of NPG
RLM (Thm 12.3, Cor 12.5, Thm 12.7; Chap 12.3, 12.4)
AK (Lem 1, Thm 2, Thm 3, Lem 4; Lec 7)
-
2025-07-08 Amith Exploration of Tabular MDPs,
UCB-VI Algorithm, Regret Analysis
RLM (Alg 5, Thm 7.1; Chap 7.1, 7.2, 7.3)
AK (Sec 2, Sec 3, Thm 5; Lec 8)
-
2025-07-11 Amith UCB-VI Algorithm, Regret Analysis (contd) RLM (Thm 7.1, 7.6; Chap 7.3, 7.4)
AK (Thm 5, Sec 3.1; Lec 8)
-
2025-07-15 Amith Improved bound for UCB-VI,
Intro to Linear Bandits
RLM (Thm 7.6; Chap 7.4. Alg 4; Chap 6.2)
AK (Sec 3.1; Lec 8. Sec 3; Lec 3)
-
2025-07-18 Amith LinUCB Algorithm
Regret Analysis of LinUCB
RLM (Alg 4, Thm 6.3, Prop 6.6; Chap 6.2, 6.3)
AK (Sec 3, Thm 2, Lem 3; Lec 3)
-
2025-07-22 Ali Linear Bellman Completeness, D-Optimal Design
LSVI Algorithm (Value Iteration for Linear Bellman Complete MDP)
RLM (Defn 3.1, Alg 1, Thm 3.2; Chap 3.1-3.3)
AK (Def 1, Prop. 2, Sec 1; Lec 9)
-
2025-07-25 Ali / AS Convergence Analysis of LSVI Algorithm
Interpretation of G-Optimal and D-Optimal Design, Kiefer–Wolfowitz Theorem
RLM (Thm 3.3, Lem 3.4; Chap 3.3.2, 3.3.3)
BALG (Thm 21.1; Chap 21.1)
-
2025-07-29 Ali Convergence Analysis of LSVI Algorithm (contd)
Least Squares Policy Evaluation (LSPE) and Analysis
RLM (Lem 3.5, Thm 3.3; Chap 3.3)
RLM (Defn 3.8, Alg 2, Thm 3.9; Chap 3.5)
-
2025-08-01 AS Low-Rank MDPs and Linear MDPs, Planning in Linear MDPs
Learning Transition using Ridge Linear Regression
RLM (Claim 8.2, Lem 8.3, 8.4; Chap 8.1, 8.2, 8.3)
AK (Defn 1, Prop 2, Sec 1; Lec 9)
-
2025-08-05 Ali Covering Number, Uniform Convergence via Covering
Uniform Convergence to Estimate Transition Dynamics
RLM (Lem 8.5, 8.6, 8.7; Chap 8.4)
---
-
2025-08-08 Aniket LSVI-UCB: UCBVI for Linear MDPs,
Analysis of LSVI-UCB
RLM (Lem 8.8, Thm 8.9; Chap 8.5, 8.6)
AK (Thm 3, Sec 2 2; Lec 9)
-

Course Description

This summer reading group explores foundational and advanced topics in Reinforcement Learning theory, following closely the RL Theory Monograph by Agarwal, Jiang, Kakade, and Sun. Participants will take turns presenting key concepts weekly, with occasional discussions drawing from classic texts Reinforcement Learning: An Introduction by Sutton and Barto. The group aims to build theoretical intuition while fostering informal collaboration around RL and broader ML theory.

Timing: Tuesday-Friday, 5:30-7 PM Central