Spring 2019
Course Info
Syllabus
Lecture schedule: Mudd 303 Monday 11:40-12:55pm
Instructor: Shipra Agrawal
Instructor Office Hours: Wednesdays from 3:00pm-4:00pm, Mudd 423
TA: Robin (Yunhao) Tang
TA Office Hours: 3:30-4:30pm Tuesday at MUDD 301
Upcoming deadlines (New)
Poster session on Monday May 6 from 10am - 1pm in the DSI space on 4th floor.
- you do not need to print actual "posters", you can print slides (9-12) and put them on the easel we will provide.
- participating poster session is mandatory - at least one person from every team should be present. We will be evaluating your projects based on the poster (and your description), and it is also a fun way to share your findings with your classmates, other fellow students and faculty, and possibly find future collaborators.
- Presenting a poster is not required for survey project. (Survey project is one where the main goal of the project is to do a thorough study of existing literature in some subtopic or application of reinforcement learning.) Survey projects need to presented in class. If you indicated that you are doing a survey in your proposal, you should have already been contacted for scheduling class presentation. Contact the instructor asap if you haven't been contacted.
Final project report due on Friday May 10. Submit usin this link
- Instructions for preparing the report: The end result of your project should be a written report clearly and concisely describing what you did, comparison to relevant related work, what results you got and what the results mean. The main body of your report should be 5-6 pages long. You can include further details or plots/figures in at most 5 page appendix. The report should use 11pt font, 1-inch margins, and single spacing. For further guidance, look here
- For survey projects reports are of utmost importance. They should thoroughly describe the relevant literature, along with your own thoughts on their contributions and open challenges. If you have your own derivations or simplifications of some proofs, please include them too. For survey projects, you may choose to make up to 7 page report with no appendix. (or up to 6 page report with at most 5 page appendix)
- Reports that vary from these guidelines risk receiving a grade deduction and/or some sections not being read.
Course requirements
(Course requirements are subject to change based on class size)
There will be roughly four programming assignments, based on Python+ Tensorflow + OpenAI gym. Additionally, the students are required to do a research project.
Pre-requisites
The course will cover both theory of MDP (overview) and practice of reinforcement learning, with programming assignments in Python. While we will try to help with skeleton codes in the beginning, it might be too difficult for you if you have no experience in programming in any language. Basic background in linear algebra, optimization algorithms (e.g., gradient descent), probability and statistics is required. Knowledge of machine learning and advanced optimization methods will be useful, but not required.
Sotware Platform for Programming Assignments
- Instabase Cloud Platform for assignment implementation and submission
- Software Installation Instructions for Windows/Mac
Piazza
We’ll be conducting all class-related discussion on Piazza this term. The quicker you begin asking questions on Piazza (rather than via emails), the quicker you’ll benefit from the collective knowledge of your classmates and instructors. We encourage you to ask questions when you’re struggling to understand a concept. You can even do so anonymously and/or privately.
Sign up for piazza here
This is the link to your course page on Piazza.
View your class discussion here
Lecture notes Spring 2019
| Class | Topics | Lecture notes |
|---|---|---|
| Jan 23 | Course Introduction | Intro slides |
| Jan 28-Feb 11 | Introduction to MDP Bellman equations, Value iteration, Policy iteration |
Lecture 1: MDP |
| Feb 13, Feb 18 | TD-learning, Q-learning (tabular) | Lecture 2: tabular RL |
| Feb 18 | Scalable Q-learning, DQN | Lecture 3: Q-learning function approximation |
| Feb 25 | Intro to deep learning using Tensorflow | Tensorflow and deep learning tutorial |
| Feb 27 -Mar 11 | Approximate DP theory, Fitted value iteration | Lecture 4: Approximate dp |
| Mar 13 | Policy gradient methods | Lecture 5: policy gradient |
| - | - Spring break - | -Spring break - |
| Mar 25 | Actor-critic methods | Lecture 6: Actor-critic |
| April 1,3, 8, 11 | Approximate RL, Intro to TRPO | Lecture 7: Approximate RL |
| April 15, April 17 | Regret analysis (MAB and RL) | Slides |
| April 22 | Multi-agent RL: presentation by Mitchell Perry | Slides TBA |
| April 24 | Robotics and RL: presentation by Boyuan Chen | Slides TBA |
| April 29 | Distributional RL: presentation by Yadin Rozov | Slides TBA |
| May 1 | Safe RL by Gejia Zhang, RL in financial portfolio management by Gary Buranasampatanon | Slides TBA |
Click here to see Lecture notes from Spring 2018
Reference material
- Guidance on project
- Markov Decision Processes: Discrete Stochastic Dynamic Programming, by Martin L. Puterman
- Neuro-dynamic Programming, by Dimitri P. Bertsekas and John Tsitsiklis
- Reinforcement Learning: An Introduction, by Andrew Barto and Richard S. Sutton
- Algorithms for Reinforcement Learning, by Csaba Szepesvári