- Understand the trade-off between exploitation and exploration with ε-policy
- ε-policy
- Bellman equation
- Deep Q networks (2 hidden layers)
- Replay buffer
Use Google Colab (with GPU enabled) to train neural networks (CartPole.ipynb)
4 trained models (after over 10000 iterations) are located in the directory pretrained_models:
0.1-ckpt.pth: ε = 0.10.01-ckpt.pth: ε = 0.010.5-ckpt.pth: ε = 0.50.05-ckpt.pth: ε = 0.05
- Ubuntu 22.04 LTS
- Python 3.8.0
- Pytorch 1.11.0
Command:
python main.pyAverage reward (moving average with window=500) for different epsilons after 10000 iterations 
The code is based on https://github.com/seungeunrho/minimalRL/blob/master/dqn.py



