June 10, 2025

Summer25 Reading Group: Reinforcement Learning (RL) Theory

Instructor: Aadirupa Saha

Sessions

DatePresenterTopicsResourceNotes
2025-06-13ZhengyaoMDP Basics, Values, Policies,
Bellman Consistency Equation
RLM (Chap 1.1.1-1.1.3),
AK (Lec 5)
-
2025-06-17ZhengyaoBellman Optimality Equations, Value Iteration,
Policy Iteration, Convergence Results
RLM (Thm 1.7, 1.8; Chap 1.3.1-1.3.3),
AK (Lec 5)
-
2025-06-20Aniket Policy Iteration, Convergence Guarantee, Episodic, Generative and Offline RL setting
The performance difference lemma
RLM (Thm 1.14, Lem 1.16; Chap 1.3.2, 1.4, 1.5)
AK (Lec 6)
-
2025-06-24 Amir Example of Policy Classes, Policy Gradient methods,
Non-convexity and Convergence of Value functions under Softmax Parameterizations
RLM (Lem 11.4, 11.5, 11.6; Chap 11.1, 11.2)
AK (Lec 6)
-
2025-06-27 Amir Natural Policy Gradient (NPG),
NPG update with Softmax Parameterization and Fisher information
RLM (Lem 12.6; Chap 12.3)
AK (Lem 4; Lec 6)
-
2025-07-04 Amir Convergence of vanilla PG,
Convergence of NPG
RLM (Thm 12.3, Cor 12.5, Thm 12.7; Chap 12.3, 12.4)
AK (Lem 1, Thm 2, Thm 3, Lem 4; Lec 7)
-
2025-07-08 Amith Exploration of Tabular MDPs,
UCB-VI Algorithm, Regret Analysis
RLM (Alg 5, Thm 7.1; Chap 7.1, 7.2, 7.3)
AK (Sec 2, Sec 3, Thm 5; Lec 8)
-
2025-07-11 Amith UCB-VI Algorithm, Regret Analysis (contd),
Improved bound for UCB-VI
RLM (Thm 7.1, 7.6; Chap 7.3, 7.4)
AK (Thm 5, Sec 3.1; Lec 8)
-
2025-07-15 Amith Improved bound for UCB-VI (contd),
Linear Bandits (LinUCB) and Regret Analysis
RLM (Thm 7.6; Chap 7.4. Alg 4, Thm 6.3, Prop 6.6; Chap 6)
AK (Sec 3.1; Lec 8. Sec 3, Thm 2, Lem 3; Lec 9)
-
2025-07-18 Ali Linear MDP problem, Planning in Linear MDPs,
LSVI Algorithm
RLM (Sec 8.1, 8.2; Chap 8. Alg 1, Chap 3.2)
AK (Def 1, Prop. 2, Sec 1; Lec 9)
-
2025-07-22 Ali LSVI Algorithm (contd), Linear Bellman Completeness
Analysis of LSVI
RLM (Def 3.1, Alg 1, Thm 3.3; Chap 3.1-3.3)
AK (Sec 2, Thm 3; Lec 9)
-
2025-07-25 --- Off-Policy Evaluation in Offline RL,
Weaker assumptions and linear Bellman completeness
RLM (Alg 2, Thm 3.7, 3.9; Chap 3.5)
AK (Assump 4, Lec 5; Sec 3; Lec 9)
-
2025-07-29 --- Learning Transition using Ridge Linear Regression,
Uniform Convergence and learning transition dynamics in Linear-MDPs
RLM (Lem 8.3, Lem 8.4, Lem 8.6, 8.7; Chap 8.3, 8.4)
-
2025-08-01 -- UCBVI for Linear MDPs,
Analysis of UCBVI for Linear MDPs
RLM (Alg 6, Thm 8.8, Thm 8.9; Chap 8.5, 8.6)
-

Course Description

This summer reading group explores foundational and advanced topics in Reinforcement Learning theory, following closely the RL Theory Monograph by Agarwal, Jiang, Kakade, and Sun. Participants will take turns presenting key concepts weekly, with occasional discussions drawing from classic texts Reinforcement Learning: An Introduction by Sutton and Barto. The group aims to build theoretical intuition while fostering informal collaboration around RL and broader ML theory.

Timing: Tuesday-Friday, 5:30-7 PM Central