Course outline

Week -1: Course outline

Slides NL Getting Python / conda virtual environment up and running

Week 0: Programming with Python

Datacamp cursus: Introduction to Python (including numpy)

Object oriented programming a.k.a. Classes

Follow this tutorial:

Python Classes

Extra info on Inheritance:

Python Inheritance
Exercise: Classes & Inheritance

Week 1: Introduction to RL

Read first chapter "Introduction" of Sutton & Barto
Datacamp tutorial Python Modules
Exercise: Tic-tac-toe
Optional: Watch Lecture 1 of David Silver (1,5 hours)

Week 2: Multi-armed bandits

Bandits are MDP with just one state. Example: pick an advertisement to show, reward when clicked. Example: pick a market, reward is units sold in a market.

Read second chapter "Multi armed bandits" of Sutton & Barto
Exercise: work through the OpenAI Gym tutorial
Exercise: Bandits_in_gym Here we code up the simple bandit algorithm of p 32 in Sutton & Barto, as well as the UCB variant.

Week 3: Theory: Markov Decision Processes (MDPs)

Read third chapter of Sutton & Barto
Optional: Watch Lecture 2 of David Silver
Selected Book Exercises Ch 3

Week 4: Dynamic Programming (DP)

Read fourth chapter of Sutton & Barto
Watch Lecture 3 of David Silver
Exercise: Udacity Notebook for solving FrozenLake using Dynamic Programming.
Optional: Apply DP functions to JacksCarRental Gym environment

Week 5: Monte Carlo (MC) control

Read selected paragraphs from Chapter 5
Exercise: Udacity Notebook for solving the BlackJack env using MC control.

Week 6: Q-learning

Read selected paragraphs from Chapter 6
Exercise: Udacity Notebook on temporal difference (TD) methods (CliffWalking environment).

Week 7: Economic application of Q-learning: algorithmic pricing

Selected papers (ACM, Calvano et al).
Presentation by Jan Svitak (ACM)

Week 8: Programming multi-agent RL using PettingZoo

Exercise 1: PettingZoo introductory tutorial using the Tic-Tac-Toe two player environment.
Exercise 2: PettingZoo Q-learning tutorial using the Tic-Tac-Toe two player environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Course outline

Week -1: Course outline

Week 0: Programming with Python

Week 1: Introduction to RL

Week 2: Multi-armed bandits

Week 3: Theory: Markov Decision Processes (MDPs)

Week 4: Dynamic Programming (DP)

Week 5: Monte Carlo (MC) control

Week 6: Q-learning

Week 7: Economic application of Q-learning: algorithmic pricing

Week 8: Programming multi-agent RL using PettingZoo

Files

README.md

Latest commit

History

README.md

File metadata and controls

Course outline

Week -1: Course outline

Week 0: Programming with Python

Week 1: Introduction to RL

Week 2: Multi-armed bandits

Week 3: Theory: Markov Decision Processes (MDPs)

Week 4: Dynamic Programming (DP)

Week 5: Monte Carlo (MC) control

Week 6: Q-learning

Week 7: Economic application of Q-learning: algorithmic pricing

Week 8: Programming multi-agent RL using PettingZoo