vime-pytorch

PyTorch implementation of the VIME paper (Variational Information Maximizing Exploration)

https://github.com/mazpie/vime-pytorch

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.1%) to scientific vocabulary

Keywords

ppo pytorch reinforcement-learning vime

Last synced: 6 months ago · JSON representation ·

Repository

PyTorch implementation of the VIME paper (Variational Information Maximizing Exploration)

Basic Info

Host: GitHub
Owner: mazpie
License: other
Language: Python
Default Branch: master
Homepage:
Size: 2.07 MB

Statistics

Stars: 7
Watchers: 2
Forks: 2
Open Issues: 18
Releases: 0

Topics

ppo pytorch reinforcement-learning vime

Created over 5 years ago · Last pushed almost 3 years ago

Metadata Files

Readme License Citation

vime-pytorch

This repo contains the PyTorch implementation of two Reinforcement Learning algorithms:

PPO (Proximal Policy Optimization) (paper)
VIME-PPO (Variational Information Maximizing Exploration) (paper)

The code makes use of openai/baselines.

The PPO implementation is mainly taken from ikostrikov/pytorch-a2c-ppo-acktr-gail.

The main novelty in this repository consists of the implementation of the VIME's exploration strategy using the PPO algorithm.

Requirements

In order to install requirements, follow:

bash pip install -r requirements.txt

If you don't have mujoco installed, follow the intructions here.

If having issues with OpenAI baselines, try:

```

Baselines for Atari preprocessing

git clone https://github.com/openai/baselines.git cd baselines pip install -e . ```

Instructions

In order to run InvertedDoublePendulum-v2 with VIME, you can use the following command:

bash python main.py --env-name InvertedDoublePendulum-v2 --algo vime-ppo --use-gae --log-interval 1 --num-steps 2048 --num-processes 1 --lr 3e-4 --entropy-coef 0 --value-loss-coef 0.5 --ppo-epoch 10 --num-mini-batch 32 --gamma 0.99 --num-env-steps 1000000 --use-linear-lr-decay --no-cuda --log-dir /tmp/doublependulum/vimeppo/vimeppo-0 --seed 0 --use-proper-time-limits --eta 0.01

Instead, to run experiments with PPO, just replace vime-ppo with ppo.

Results

For standard gym environments, I used --eta 0.01.

MountainCar-v0

InvertedDoublePendulum-v2

For sparse gym environments, I used --eta 0.0001.

MountainCar-v0-Sparse

HalfCheetah-v3-Sparse

[the number in parenthesis represents how many experiments have been run]

Note:

Any gym-compatible environment can be run, but the hyperparameters have not been tested for all of them.

However, the parameters used with the InvertedDoublePendulum-v2 example in the Instructions are, generally, good enough for other mujoco environments.

TODO:

Integrate more args into the command line

Owner

Name: Pietro Mazzaglia
Login: mazpie
Kind: user
Location: Ghent, Belgium
Company: UGent

Website: https://mazpie.github.io/
Repositories: 1
Profile: https://github.com/mazpie

Artificial Intelligence student

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Mazzaglia"
  given-names: "Pietro"
  orcid: "https://orcid.org/0000-0003-3319-5986"
title: "VIME: Variational Information Maximizing Exploration, implementation in PyTorch"
date-released: 2020-09-03
url: "https://github.com/mazpie/vime-pytorch"

GitHub Events

Total

Watch event: 2

Last Year

Watch event: 2

Dependencies

requirements.txt pypi

Cython ==0.29.20
Jinja2 ==2.11.2
Keras-Applications ==1.0.8
Keras-Preprocessing ==1.1.2
Markdown ==3.2.2
MarkupSafe ==1.1.1
Pillow ==7.1.2
Pygments ==2.6.1
QtPy ==1.9.0
Send2Trash ==1.5.0
Werkzeug ==1.0.1
absl-py ==0.9.0
astor ==0.8.1
atari-py ==0.2.6
attrs ==19.3.0
backcall ==0.2.0
bleach ==3.1.4
cffi ==1.14.0
click ==7.1.2
cloudpickle ==1.2.0
cycler ==0.10.0
decorator ==4.4.2
defusedxml ==0.6.0
entrypoints ==0.3
enum34 ==1.1.10
fasteners ==0.15
future ==0.18.2
gast ==0.2.2
glfw ==1.11.2
google-pasta ==0.2.0
grpcio ==1.30.0
gym ==0.15.7
h5py ==2.10.0
html5lib ==0.9999999
imageio ==2.8.0
importlib-metadata ==1.6.1
ipykernel ==5.3.0
ipython ==7.16.0
ipython-genutils ==0.2.0
ipywidgets ==7.5.1
jedi ==0.17.1
joblib ==0.15.1
jsonschema ==3.2.0
jupyter ==1.0.0
jupyter-client ==6.1.3
jupyter-console ==6.1.0
jupyter-core ==4.6.3
kiwisolver ==1.2.0
matplotlib ==3.2.2
mistune ==0.8.4
monotonic ==1.5
mujoco-py ==2.0.2.10
nbconvert ==5.6.1
nbformat ==5.0.7
notebook ==6.0.3
numpy ==1.19.0
opencv-python ==4.2.0.34
opt-einsum ==3.2.1
pandas ==1.0.5
pandocfilters ==1.4.2
parso ==0.7.0
pexpect ==4.8.0
pickleshare ==0.7.5
prometheus-client ==0.8.0
prompt-toolkit ==3.0.5
protobuf ==3.12.2
ptyprocess ==0.6.0
pybullet ==2.8.2
pycparser ==2.20
pyglet ==1.5.0
pyparsing ==2.4.7
pyrsistent ==0.16.0
python-dateutil ==2.8.1
pytz ==2020.1
pyzmq ==19.0.1
qtconsole ==4.7.5
scipy ==1.5.0
setuptools ==41.0.0
six ==1.15.0
tensorboard ==1.14.0
tensorflow ==1.14.0
tensorflow-estimator ==1.14.0
tensorflow-gpu ==1.14.0
tensorflow-tensorboard ==0.4.0
termcolor ==1.1.0
terminado ==0.8.3
testpath ==0.4.4
torch ==1.5.1
torchvision ==0.6.1
tornado ==6.0.4
tqdm ==4.46.1
traitlets ==4.3.3
wcwidth ==0.2.5
webencodings ==0.5.1
widgetsnbextension ==3.5.1
wrapt ==1.12.1
zipp ==3.1.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

vime-pytorch

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

vime-pytorch

Requirements

Baselines for Atari preprocessing

Instructions

Results

Note:

TODO:

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies