View on GitHub

IEOR 8100

Reinforcement Learning

Course Info

Lecture: (644 Seeley W. Mudd Building, Columbia University): Mondays from 1:10-3:40pm
Instructor: Shipra Agrawal
Instructor Office Hours: Wednesdays from 2:00pm-3:00pm, Mudd 423
TA: Robin Tang
TA Office Hours: 12:30-1:30 pm Fridays

Course requirements

There will be roughly four programming assignments, based on Python+ Tensorflow + OpenAI gym. Every student is also required to read and present one recent research paper. A list of papers will be provided to choose from. Additionally, the students are required to do a research project.
More information on the schedule and duration of paper presentation, and the nature of research projects will be provided later in the course.



The course will cover both theory of MDP (overview) and practice of reinforcement learning, with programming assignments in Python. While we will try to help with skeleton codes in the beginning, it might be too difficult for you if you have no experience in programming in any language. Basic background in linear algebra, optimization algorithms (e.g., gradient descent), probability and statistics is required. Knowledge of machine learning and advanced optimization methods will be useful, but not required.

Sotware Platform for Programming Assignments


We’ll be conducting all class-related discussion on Piazza this term. The quicker you begin asking questions on Piazza (rather than via emails), the quicker you’ll benefit from the collective knowledge of your classmates and instructors. We encourage you to ask questions when you’re struggling to understand a concept. You can even do so anonymously and/or privately.
Sign up for piazza here
View your class discussion here


Class Topics Lecture notes
Jan 22 Course Introduction
Introduction to MDP
Intro slides
Section 1-3 of Lecture 1: MDP
Jan 29 Bellman equations, Iterative algorithms for MDP Section 4-5 of Lecture 1: MDP
Feb 5 TD-learning, Q-learning (tabular) Lecture 2: tabular RL
Feb 12 Scalable Q-learning, DQN
Intro to deep learning through Tensorflow
Lecture 3: Q-learning function approximation
Tensorflow and deep learning tutorial
Feb 19 Approximate DP theory, Fitted value iteration (the lecture notes are under construction, will be updated soon) Lecture 4: Approximate dp
Feb 26 Policy gradient methods Lecture 5: policy gradient
Mar 5 Actor-critic methods Lecture 6: Actor-critic
Mar 5 Approximate RL, Intro to TRPO Lecture 7: Approximate RL
Mar 19 Guest lecture by Krzysztof Choromanski Slides
Mar 26 Guest lecture by Boyuan Chen on RL in robotics Slides
Apr 4- 30 Paper presentations List of papers