June 10, 2025

Reading Group: Reinforcement Learning Theory (Summer, 2025)

📡 •Instructor: Aadirupa Saha

Sessions

DatePresenterTopicsResource
2025-06-13ZhengyaoMDP Basics, Values, Policies,
Bellman Consistency Equation
RLM (Chap 1.1.1-1.1.3),
AK (Lec 5)
2025-06-17ZhengyaoBellman Optimality Equations, Value Iteration,
Policy Iteration, Convergence Results
RLM (Thm 1.7, 1.8; Chap 1.3.1-1.3.3),
AK (Lec 5)
2025-06-20Aniket Policy Iteration, Convergence Guarantee, Episodic, Generative and Offline RL setting
The performance difference lemma
RLM (Thm 1.14, Lem 1.16; Chap 1.3.2, 1.4, 1.5)
AK (Lec 6)
2025-06-24 Amir Example of Policy Classes, Policy Gradient methods,
Non-convexity and Convergence of Value functions under Softmax Parameterizations
RLM (Lem 11.4, 11.5, 11.6; Chap 11.1, 11.2)
AK (Lec 6)
2025-06-27 Amir Natural Policy Gradient (NPG),
NPG update with Softmax Parameterization and Fisher information
RLM (Lem 12.6; Chap 12.3)
AK (Lem 4; Lec 6)
2025-07-04 Amir Convergence of vanilla PG,
Convergence of NPG
RLM (Thm 12.3, Cor 12.5, Thm 12.7; Chap 12.3, 12.4)
AK (Lem 1, Thm 2, Thm 3, Lem 4; Lec 7)
2025-07-08 Amith Exploration of Tabular MDPs,
UCB-VI Algorithm, Regret Analysis
RLM (Alg 5, Thm 7.1; Chap 7.1, 7.2, 7.3)
AK (Sec 2, Sec 3, Thm 5; Lec 8)
2025-07-11 Amith UCB-VI Algorithm, Regret Analysis (contd) RLM (Thm 7.1, 7.6; Chap 7.3, 7.4)
AK (Thm 5, Sec 3.1; Lec 8)
2025-07-15 Amith Improved bound for UCB-VI,
Intro to Linear Bandits
RLM (Thm 7.6; Chap 7.4. Alg 4; Chap 6.2)
AK (Sec 3.1; Lec 8. Sec 3; Lec 3)
2025-07-18 Amith LinUCB Algorithm
Regret Analysis of LinUCB
RLM (Alg 4, Thm 6.3, Prop 6.6; Chap 6.2, 6.3)
AK (Sec 3, Thm 2, Lem 3; Lec 3)
2025-07-22 Ali Linear Bellman Completeness, D-Optimal Design
LSVI Algorithm (Value Iteration for Linear Bellman Complete MDP)
RLM (Defn 3.1, Alg 1, Thm 3.2; Chap 3.1-3.3)
AK (Def 1, Prop. 2, Sec 1; Lec 9)
2025-07-25 Ali / AS Convergence Analysis of LSVI Algorithm
Interpretation of G-Optimal and D-Optimal Design, Kiefer–Wolfowitz Theorem
RLM (Thm 3.3, Lem 3.4; Chap 3.3.2, 3.3.3)
BALG (Thm 21.1; Chap 21.1)
2025-07-29 Ali Convergence Analysis of LSVI Algorithm (contd)
Least Squares Policy Evaluation (LSPE) and Analysis
RLM (Lem 3.5, Thm 3.3; Chap 3.3)
RLM (Defn 3.8, Alg 2, Thm 3.9; Chap 3.5)
2025-08-01 AS Low-Rank MDPs and Linear MDPs, Planning in Linear MDPs
Learning Transition using Ridge Linear Regression
RLM (Claim 8.2, Lem 8.3, 8.4; Chap 8.1, 8.2, 8.3)
AK (Defn 1, Prop 2, Sec 1; Lec 9)
2025-08-05 Ali Covering Number, Uniform Convergence via Covering
Uniform Convergence to Estimate Transition Dynamics
RLM (Lem 8.5, 8.6, 8.7; Chap 8.4)
---
2025-08-08 Aniket LSVI-UCB: UCBVI for Linear MDPs,
Analysis of LSVI-UCB
RLM (Lem 8.8, Thm 8.9; Chap 8.5, 8.6)
AK (Thm 3, Sec 2; Lec 9)
2025-08-12 Aniket LSVI-UCB Analysis (contd),
Hypothesis Class of Bounded Q and V Bellman Rank
RLM (Lem 8.10, 8.11, 8.12; Thm 8.9; Chap 8.6)
RLM (Defn 9.1, Sec 9.2; Chap 9.1, 9.2)
2025-08-15 Aniket Example of MDPs with `small' Q, V Bellman Rank
Understanding OLIVE under low-Bellman Rank
RLM (Prop 9.3-9.8; Chap 9.3)
OLIVE paper, Bellman Rank notes, Nan Jiang
2025-08-19 Aniket Bilinear Class of MDPs and Examples
BLin-UCB: PAC-RL with Bounded Bilinear Rank and Sample Complexity Analysis
RLM (Def 9.10, Prop 9.11; Chap 9.4)
RLM (Alg 7, Thm 9.16, Cor 9.17; Chap 9.5)
2025-08-22 Amir Compatible function approximation, NPG, Q-NPG, and Examples.
NPG Regret Lemma
RLM (Lem 13.1, 13.2; Chap 13.1, 13.2)
RLG (Lem 13.3, Chap 13.3)
2025-09-05 Amir Analysis of NPG Regret Lemma (contd),
Relative condition number, Sample Complexity Analysis of Q-NPG for Log-Linear Policies
RLM (Lem 13.3, Chap 13.3)
RLM (Assump 13.5, Thm 13.6; Chap 13.4)

Course Description

This summer reading group explores foundational and advanced topics in Reinforcement Learning theory, following closely the RL Theory Monograph by Agarwal, Jiang, Kakade, and Sun. Participants will take turns presenting key concepts weekly, with occasional discussions drawing from classic texts Reinforcement Learning: An Introduction by Sutton and Barto. The group aims to build theoretical intuition while fostering informal collaboration around RL and broader ML theory.

Timing: Tuesday-Friday, 5:30-7 PM Central