Deep Q Network on CartPole problem (OpenAI Gym)

Objectives

Understand the trade-off between exploitation and exploration with ε-policy

Reinforcement learning

ε-policy
Bellman equation
Deep Q networks (2 hidden layers)
Replay buffer

Environments

Train

Use Google Colab (with GPU enabled) to train neural networks (CartPole.ipynb)

4 trained models (after over 10000 iterations) are located in the directory pretrained_models:

0.1-ckpt.pth: ε = 0.1
0.01-ckpt.pth: ε = 0.01
0.5-ckpt.pth: ε = 0.5
0.05-ckpt.pth: ε = 0.05

Test

Ubuntu 22.04 LTS
Python 3.8.0
Pytorch 1.11.0

Command:

python main.py

Results

Average reward (moving average with window=500) for different epsilons after 10000 iterations

Greedy test: Model trained with ε=0.01
Greedy test: Model trained with ε=0.05
Greedy test: Model trained with ε=0.1
Greedy test: Model trained with ε=0.5

References

The code is based on https://github.com/seungeunrho/minimalRL/blob/master/dqn.py

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
pretrained_models		pretrained_models
results		results
.gitignore		.gitignore
10000iters.png		10000iters.png
CartPole.ipynb		CartPole.ipynb
README.md		README.md
environment.yml		environment.yml
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Q Network on CartPole problem (OpenAI Gym)

Objectives

Reinforcement learning

Environments

Train

Test

Results

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deep Q Network on CartPole problem (OpenAI Gym)

Objectives

Reinforcement learning

Environments

Train

Test

Results

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages