## Course Info

**Syllabus**

**Lecture:** (644 Seeley W. Mudd Building, Columbia University): Mondays from 1:10-3:40pm

**Instructor:** Shipra Agrawal

**Instructor Office Hours:** Wednesdays from 2:00pm-3:00pm, Mudd 423

**TA:** Robin Tang

**TA Office Hours:** 12:30-1:30 pm Fridays

## Course requirements

There will be roughly four programming assignments, based on Python+ Tensorflow + OpenAI gym. Every student is also required to read and present one recent research paper. A list of papers will be provided to choose from. Additionally, the students are required to do a research project.

More information on the schedule and duration of paper presentation, and the nature of research projects will be provided later in the course.

### New!

- Reference list for paper selection
- If you are enrolled, you should also have received an invitation to edit another document, where you can enter your paper selection.
- Deadline for paper selection
*March 9*

- Guidance on project
- Deadline for 1-page project proposal
*March 26* - Project reports due (1st draft)
*April 30*, Final draft due on*May 6*.

- Deadline for 1-page project proposal

## Pre-requisites

The course will cover both theory of MDP (overview) and practice of reinforcement learning, with programming assignments in Python. While we will try to help with skeleton codes in the beginning, it might be too difficult for you if you have no experience in programming in any language. Basic background in linear algebra, optimization algorithms (e.g., gradient descent), probability and statistics is required. Knowledge of machine learning and advanced optimization methods will be useful, but not required.

## Sotware Platform for Programming Assignments

**Instabase Cloud Platform**for assignment implementation and submission**Software Installation Instructions**for Windows/Mac

## Piazza

We’ll be conducting all class-related discussion on Piazza this term. The quicker you begin asking questions on Piazza (rather than via emails), the quicker you’ll benefit from the collective knowledge of your classmates and instructors. We encourage you to ask questions when you’re struggling to understand a concept. You can even do so anonymously and/or privately.

Sign up for piazza here

View your class discussion here

## Schedule

Class | Topics | Lecture notes |
---|---|---|

Jan 22 | Course Introduction Introduction to MDP |
Intro slides Section 1-3 of Lecture 1: MDP |

Jan 29 | Bellman equations, Iterative algorithms for MDP | Section 4-5 of Lecture 1: MDP |

Feb 5 | TD-learning, Q-learning (tabular) | Lecture 2: tabular RL |

Feb 12 | Scalable Q-learning, DQN Intro to deep learning through Tensorflow |
Lecture 3: Q-learning function approximation Tensorflow and deep learning tutorial |

Feb 19 | Approximate DP theory, Fitted value iteration (the lecture notes are under construction, will be updated soon) | Lecture 4: Approximate dp |

Feb 26 | Policy gradient methods | Lecture 5: policy gradient |

Mar 5 | Actor-critic methods | Lecture 6: Actor-critic |

Mar 5 | Approximate RL, Intro to TRPO | Lecture 7: Approximate RL |

Mar 19 | Guest lecture by Krzysztof Choromanski | Slides |

Mar 26 | Guest lecture by Boyuan Chen on RL in robotics | Slides |

Apr 4- 30 | Paper presentations | List of papers |

## References

- Markov Decision Processes: Discrete Stochastic Dynamic Programming, by Martin L. Puterman
- Neuro-dynamic Programming, by Dimitri P. Bertsekas and John Tsitsiklis
- Reinforcement Learning: An Introduction, by Andrew Barto and Richard S. Sutton
- Algorithms for Reinforcement Learning, by Csaba Szepesvári
- Recent research papers on deep reinforcement learning