The official implementation of "On Offline Reinforcement Learning for Sparse Reward Tasks".
To reproduce reported results, please follow the steps below inside the project folder:
./install.sh-
D4RL with artificially delayed-reward tasks and sparse reward tasks.
-
NeoRL with artifically delayed-reward tasks.
-
RecS with real-world simulated sparse reward tasks.
All running scripts are placed under the scripts folder, some examples are provided below:
To run d4rl delayde-reward task:
python train_d4rl.py --algo_name=mopo --strategy=average \ --task=halfcheetah-medium-expert-v0 --delay_mode=constant --seed=10To run d4rl sparse-reward task:
python train_d4rl.py --algo_name=mopo --strategy=average \ --task=antmaze-medium-play-v2 --delay_mode=none --seed=10To run neorl delayed-reward task:
python train_neorl.py --algo_name=mopo --strategy=average \ --task=Halfcheetah-v3-low-100 --delay_mode=constant --seed=10To run recs sparse-reward task:
python train_recs.py --algo_name=mopo --strategy=average \ --task=recs-random-v0 --seed=10This project record the training log with Tensorboard in local directory logs/ and Wandb on website.
This project includes experiments on d4rl benchmark and neorl benchmark, our implementation based on the OfflineRL codebase for efficiency.
To cite this repository:
@misc{offlinerlsparse, autho = {Ritchie Huang, Kuo Li}, title = {OfflineRLSparseReward}, year = {2022}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/RITCHIEHuang/OfflineRLSparseReward}} }