This is a PyTorch implementation of Deep Q-Learning.
Experience memory capacity: 80000 set
Random action ratio: from 1.0 to 0.1 through 1000000 frames
54_8520.mp4
Blue line: Score of training episode.
Magenta line: Mean score of last 100 training episodes.
Red star: Score of target agent.
Experience memory capacity: 10000 set
Random action ratio: from 0.9 to 0.05 through 200 episodes
221_172.mp4
Blue line: Score of training episode.
Magenta line: Mean score of last 100 training episodes.