Slides NL Getting Python / conda virtual environment up and running
Datacamp cursus: Introduction to Python (including numpy)
- Object oriented programming a.k.a. Classes
Follow this tutorial:
Extra info on Inheritance:
-
Exercise: Classes & Inheritance
-
Read first chapter "Introduction" of Sutton & Barto
-
Exercise: Tic-tac-toe
-
Optional: Watch Lecture 1 of David Silver (1,5 hours)
Bandits are MDP with just one state. Example: pick an advertisement to show, reward when clicked. Example: pick a market, reward is units sold in a market.
-
Read second chapter "Multi armed bandits" of Sutton & Barto
-
Exercise: work through the OpenAI Gym tutorial
-
Exercise: Bandits_in_gym Here we code up the simple bandit algorithm of p 32 in Sutton & Barto, as well as the UCB variant.
-
Read third chapter of Sutton & Barto
-
Optional: Watch Lecture 2 of David Silver
-
Selected Book Exercises Ch 3
-
Read fourth chapter of Sutton & Barto
-
Exercise: Udacity Notebook for solving FrozenLake using Dynamic Programming.
-
Optional: Apply DP functions to JacksCarRental Gym environment
-
Read selected paragraphs from Chapter 5
-
Exercise: Udacity Notebook for solving the BlackJack env using MC control.
-
Read selected paragraphs from Chapter 6
-
Exercise: Udacity Notebook on temporal difference (TD) methods (CliffWalking environment).
- Selected papers (ACM, Calvano et al).
- Presentation by Jan Svitak (ACM)
- Exercise 1: PettingZoo introductory tutorial using the Tic-Tac-Toe two player environment.
- Exercise 2: PettingZoo Q-learning tutorial using the Tic-Tac-Toe two player environment.