Skip to content

Latest commit

 

History

History
94 lines (47 loc) · 2.84 KB

README.md

File metadata and controls

94 lines (47 loc) · 2.84 KB

Course outline

Week -1: Course outline

Slides NL Getting Python / conda virtual environment up and running

Week 0: Programming with Python

Datacamp cursus: Introduction to Python (including numpy)

  • Object oriented programming a.k.a. Classes

Follow this tutorial:

Extra info on Inheritance:

Week 1: Introduction to RL

Week 2: Multi-armed bandits

Bandits are MDP with just one state. Example: pick an advertisement to show, reward when clicked. Example: pick a market, reward is units sold in a market.

  • Read second chapter "Multi armed bandits" of Sutton & Barto

  • Exercise: work through the OpenAI Gym tutorial

  • Exercise: Bandits_in_gym Here we code up the simple bandit algorithm of p 32 in Sutton & Barto, as well as the UCB variant.

Week 3: Theory: Markov Decision Processes (MDPs)

Week 4: Dynamic Programming (DP)

Week 5: Monte Carlo (MC) control

  • Read selected paragraphs from Chapter 5

  • Exercise: Udacity Notebook for solving the BlackJack env using MC control.

Week 6: Q-learning

  • Read selected paragraphs from Chapter 6

  • Exercise: Udacity Notebook on temporal difference (TD) methods (CliffWalking environment).

Week 7: Economic application of Q-learning: algorithmic pricing

Week 8: Programming multi-agent RL using PettingZoo