Weekend Deep Reinforcement Learning (DRL) is a self-study of DRL in my free time. DRL is very easy, especially when you already have a bit background in Control and Deep Learning. Even without the background, the concept is still very simple, so why not study and have fun with it.
My implementation aims to provides a minimal code implementation, and short notes to summarize the theory.
- The code, modules, and config system are written based on
mmcv
configs and registry system, thus very easy to adopt, adjust components by changing the config files. - Lecture Notes: No lengthy math, just the motivation concept, key equations for implementing, and a summary of tricks that makes the methods work. More important, I try to make the connection with previous methods as possible.
My learning strategy is to go directly to summarize and implement the papers, starting from the basic one. I hate the fact that most of the books in RL always start with very heavy theory background, asking us to remember many vague definitions, such as what is On-Line, Off-Line, Policy Gradient, etc. NO, NO, NO !!! Let play with the basic blocks first. When we feel comfortable, just recap and introduce these concepts later. It is absolutely fine if you don't remember these definitions at all.
Following are the great resource that I learn from:
- https://spinningup.openai.com/en/latest/
- https://simoninithomas.github.io/deep-rl-course/#syllabus
- https://github.com/sweetice/Deep-reinforcement-learning-with-pytorch
- https://github.com/DLR-RM/stable-baselines3
- https://github.com/thu-ml/tianshou
- https://github.com/araffin/rl-baselines-zoo
- https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html
- https://intellabs.github.io/coach/usage.html#
conda create -n RL --python=3.8 -y
conda install tqdm mathplotlib scipy
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
pip install gym
pip install gym[all] #Install the environment dependence
# or pip install cmake 'gym[atari]'
pip install pybullet
import gym
env = gym.make('CartPole-v0')
for i_episode in range(20):
observation = env.reset() # Before start, reset the environment
for t in range(100):
env.render()
print(observation)
action = env.action_space.sample() # This is where your code should return action
observation, reward, done, info = env.step(action)
if done:
print("Episode finished after {} timesteps".format(t+1))
break
env.close()
- Every environment comes with an
env.action_space
and anenv.observation_space
. - List all available environments:
gym.envs.registry.all()
.
Paper ranking:
- 🏆 Must known benchmark papers.
- 🚀 Improved version of benchmark papers. Come back after finishing the benchmark papers.
- Q-Learning: Introduction to RL with Q-Learning
- Deep Q-Learning:
- Actor-Critic methods:
- 🏆 Deep Deterministic Policy Gradient (DDPG - ICLR 2016): Note | code | config
- 🏆 Twin Delayed DDPG (TD3 - ICML 2018): Note | code | config
- 🏆 Soft Actor-Critic (SAC - ICML 2018): Note | code | config
- 🚀 Meta-SAC (ICML 7th Workshop -2020)
- 🚀 Smooth Exploration for Robotic Reinforcement Learning (arXiv 2021)
- Recap and overview of RL methods:
- Policy Gradient:
- How to deal with Sparse Reward for Off-Line learning:
- On-Line Policy (TBD)
- Model-Based Learning (TBD)
- Multi-Agent Learning (TBD)
Except the first Q-Learning
tutorial, that is for RL introduction, all other methods can be easily trained as:
python tools/train.py [path/to/config.py] [--extra_args]
For example, to train a Deep Q-Learning (DQN) for mountain car env, use:
python tools/train.py configs/DQN/dqn_mountain_car.py