August 25, 2025
CS 594: RLHF Theory for AI-Alignment and Fine-Tuning LLMs (Fall, 2025)
---------- This website is under construction ----------
π Course Schedule (NOT FINALIZED)
Date | Topic | Reading | Materials | Notes | Top-3 Paper Recommendation |
---|---|---|---|---|---|
Aug 26 | Introduction Basics of AI-Alignment | ||||
Aug 28 | Concentration Bounds MAB | ||||
Sep 2 | UCB Algorithm | ||||
Sep 4 | Online Mirror Descent | ||||
Sep 9 | Linear Bandits | ||||
Sep 11 | Paper presentation |
| |||
Sep 16 | Linear Bandits (contd) | ||||
Sep 18 | Dueling Bandits: Learning from Preferences | ||||
Sep 23 | Paper presentation |
| |||
Sep 25 | Paper presentation |
| |||
Sep 30 | Dueling Bandits (contd) | ||||
Oct 2 | Paper presentation |
| |||
Oct 7 | Contextual Bandits: EXP4 (1 step RL) | ||||
Oct 9 | Paper presentation | ||||
Oct 14 | Contextual MAB (contd) | ||||
Oct 16 | Contextual - Dueling Bandits | ||||
Oct 21 | Intro to RL + MDP Basics | ||||
Oct 23 | Tabular MDP-UCB-VI | ||||
Oct 28 | Paper presentation | ||||
Oct 30 | β-- Happy Halloween! β-- | ||||
Nov 4 | Linear function approximation | ||||
Nov 6 | PG Methods | ||||
Nov 11 | PPO + TRPO | ||||
Nov 13 | Paper presentation | ||||
Nov 18 | Imitation Learning | ||||
Nov 20 | Paper presentation | ||||
Nov 25 | Presentation (20 mins/ team) | ||||
Nov 27 | β-- Thanksgiving Break! β-- | ||||
Dec 2 | Presentation (20 mins/ team) | ||||
Dec 4 | Presentation (20 mins/ team) |
π Course Description (Tentitive)
Overview
This course aims to provide a rigorous mathematical foundation for understanding and implementing Reinforcement Learning from Human Feedback (RLHF). The curriculum is structured around interconnected modules that progress from theoretical foundations to practical implementation.
Modules
Topics Set 1: Formalizing the Alignment Problem
Establishes the conceptual framework for AI alignment, examining outer versus inner alignment, reward misspecification, and Goodhartβs Law in human feedback systems. Students analyze real-world failure modes and develop intuition for how alignment can degrade in deployed systems.
Topics Set 2: Evaluation & Alignment Verification
Addresses measuring alignment beyond simple reward maximization, covering multi-dimensional evaluation frameworks (HHH: Helpfulness, Harmlessness, Honesty), human evaluation pipeline design, adversarial testing, and robustness verification techniques are essential for production deployment.
Topics Set 3: Reinforcement Learning Theory
Provides the mathematical foundations underlying RLHF algorithms. Beginning with MDP fundamentals and Bellman equations, the module progresses through policy gradient methods, exploration strategies in tabular and linear settings, and advanced topics including low-rank MDPs and uniform convergence theory. Students master both classical results and recent theoretical developments.
Topics Set 4: RLHF Theory and Practice
Synthesizes preference learning, contextual bandits, and human-in-the-loop optimization. Topics include active learning for efficient feedback collection, handling noisy and biased human inputs, integrating multiple feedback sources, and maintaining safety and robustness guarantees throughout the pipeline.
Topics Set 5: Large Language Model Theory
Bridges abstract RL theory and practical LLM deployment. Covers transformer architectures, fine-tuning methodologies, parameter-efficient adaptation (LoRA, adapters), preference modeling for reward extraction, and specialized RL algorithms (e.g., PPO with KL regularization) tailored to language model optimization.
Prerequisites
This course demands strong mathematical maturity and technical proficiency. Advanced linear algebra and probability theory are essential (matrix analysis, eigendecompositions, concentration inequalities, stochastic processes). Knowledge of machine learning theory (optimization, generalization bounds, statistical learning theory) is required.
Programming competency in Python is required (PyTorch or similar recommended), and LaTeX proficiency is mandatory for assignments and the final project. The theoretical content assumes familiarity with measure theory, basics of functional analysis, and advanced calculus. Students unsure about preparation should complete prerequisite coursework before enrolling.
Learning Outcomes
Graduates will understand the mathematical principles governing preference learning, design and evaluate alignment verification systems, and implement production-grade RLHF pipelines. The course prepares students for advanced research roles in AI labs, senior engineering roles deploying large language models, and independent research in AI alignment. Students will be equipped to drive innovation, lead technical teams, and contribute to the theoretical and practical advances shaping the future of safe AI deployment.
β οΈ Prerequisites
Expect this to be a fairly math intensive course. Please familiarize yourself with the basics of Probability-Statistics (PS) and Linear-Algebra (LA). Recommended introductory lectures to check if you are comfortable with the basics:
- PS review:
- LA review:
Familiarity with LaTeX for scientific writing for scribing the lecture notes (only for MS students). You can learn the basics from A Simple Quickstart Guide. Many other online tutorials are available for beginners β feel free to explore and use whichever best suits your needs.
Programming (Experiments) (in Python). Be prepared to code: [Python ML Tutorials], [Google Colab] (many online tutorials available for beginners).
A strong grasp of the foundational material outlined above is expected of all students taking the course for credit. Insufficient preparation may adversely affect your ability to engage with the course content and perform successfully in assessments, which may impact your final grades.
π Grading Policy
- Project + Report: 25%Problem selection 5% β’ Motivation 5% β’ Solution 10% β’ Experiments 5%
- Paper Presentation + Coding: 20%Topic choice + Motivation 5% β’ Theoretical explanation 10% β’ Experiments 5%
- Scribe: 15%Approximately 2 lectures
- Piazza-Weekly Problem: 10% +5% extra creditPost a new unresolved problem per week (with motivation based), on that week's lectures
- Piazza-Weekly Paper: 10% +5% extra creditFind a related paper per week based on the topics covered in that week
- Class Participation: 10% +5% extra creditLecture questions/answers 5% β’ Presentation questions 5%
- Quiz: 10%Random iClicker quizzes
π Resources
Class lecture will be based on, but not limited to, the following books:
- [SB] Reinforcement Learning: An Introduction by Sutton & Barto
- [BA] Bandit Algorithms by Szepesvari & Lattimore
- [RLM] RL: Theory & Algorithms by Agarwal, Jiang, Kakade, Sun
- [FRL] Foundations of Reinforcement Learning and Interactive Decision Making by Foster & Rakhlin
π― Course Logistics
- π Location: CDRLC
- β° Schedule: Tuesday & Thursday, 2:00 - 3:15 PM
- ποΈ Office Hours: Thurdays 5:00β6:00 PM or by appointment
- π§ Piazza: TBA
π Important Dates
- Select Project Topic: Oct 3rd, 2025
- Project Presentation: Nov 25th, Dec 2nd, Dec 4th, 2025
- Project Report: Dec 6th, 2025