View on GitHub

IEOR 8100

Reinforcement Learning

Course Info

Lecture schedule: Mudd 303 Monday 11:40-12:55pm
Instructor: Shipra Agrawal
Instructor Office Hours: Wednesdays from 3:00pm-4:00pm, Mudd 423
TA: Robin (Yunhao) Tang
TA Office Hours: 3:30-4:30pm Tuesday at MUDD 301

Upcoming deadlines (New)

Poster session on Monday May 6 from 10am - 1pm in the DSI space on 4th floor.
  • you do not need to print actual "posters", you can print slides (9-12) and put them on the easel we will provide.
  • participating poster session is mandatory - at least one person from every team should be present. We will be evaluating your projects based on the poster (and your description), and it is also a fun way to share your findings with your classmates, other fellow students and faculty, and possibly find future collaborators.
  • Presenting a poster is not required for survey project. (Survey project is one where the main goal of the project is to do a thorough study of existing literature in some subtopic or application of reinforcement learning.) Survey projects need to presented in class. If you indicated that you are doing a survey in your proposal, you should have already been contacted for scheduling class presentation. Contact the instructor asap if you haven't been contacted.
Final project report due on Friday May 10. Submit usin this link
  • Instructions for preparing the report: The end result of your project should be a written report clearly and concisely describing what you did, comparison to relevant related work, what results you got and what the results mean. The main body of your report should be 5-6 pages long. You can include further details or plots/figures in at most 5 page appendix. The report should use 11pt font, 1-inch margins, and single spacing. For further guidance, look here
  • For survey projects reports are of utmost importance. They should thoroughly describe the relevant literature, along with your own thoughts on their contributions and open challenges. If you have your own derivations or simplifications of some proofs, please include them too. For survey projects, you may choose to make up to 7 page report with no appendix. (or up to 6 page report with at most 5 page appendix)
  • Reports that vary from these guidelines risk receiving a grade deduction and/or some sections not being read.

Course requirements

(Course requirements are subject to change based on class size)
There will be roughly four programming assignments, based on Python+ Tensorflow + OpenAI gym. Additionally, the students are required to do a research project.


The course will cover both theory of MDP (overview) and practice of reinforcement learning, with programming assignments in Python. While we will try to help with skeleton codes in the beginning, it might be too difficult for you if you have no experience in programming in any language. Basic background in linear algebra, optimization algorithms (e.g., gradient descent), probability and statistics is required. Knowledge of machine learning and advanced optimization methods will be useful, but not required.

Sotware Platform for Programming Assignments


We’ll be conducting all class-related discussion on Piazza this term. The quicker you begin asking questions on Piazza (rather than via emails), the quicker you’ll benefit from the collective knowledge of your classmates and instructors. We encourage you to ask questions when you’re struggling to understand a concept. You can even do so anonymously and/or privately.
Sign up for piazza here
This is the link to your course page on Piazza. View your class discussion here

Lecture notes Spring 2019

Class Topics Lecture notes
Jan 23 Course Introduction Intro slides
Jan 28-Feb 11 Introduction to MDP
Bellman equations, Value iteration, Policy iteration
Lecture 1: MDP
Feb 13, Feb 18 TD-learning, Q-learning (tabular) Lecture 2: tabular RL
Feb 18 Scalable Q-learning, DQN Lecture 3: Q-learning function approximation
Feb 25 Intro to deep learning using Tensorflow Tensorflow and deep learning tutorial
Feb 27 -Mar 11 Approximate DP theory, Fitted value iteration Lecture 4: Approximate dp
Mar 13 Policy gradient methods Lecture 5: policy gradient
- - Spring break - -Spring break -
Mar 25 Actor-critic methods Lecture 6: Actor-critic
April 1,3, 8, 11 Approximate RL, Intro to TRPO Lecture 7: Approximate RL
April 15, April 17 Regret analysis (MAB and RL) Slides
April 22 Multi-agent RL: presentation by Mitchell Perry Slides TBA
April 24 Robotics and RL: presentation by Boyuan Chen Slides TBA
April 29 Distributional RL: presentation by Yadin Rozov Slides TBA
May 1 Safe RL by Gejia Zhang, RL in financial portfolio management by Gary Buranasampatanon Slides TBA

Click here to see Lecture notes from Spring 2018

Reference material