vime-pytorch
PyTorch implementation of the VIME paper (Variational Information Maximizing Exploration)
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.1%) to scientific vocabulary
Keywords
Repository
PyTorch implementation of the VIME paper (Variational Information Maximizing Exploration)
Basic Info
Statistics
- Stars: 7
- Watchers: 2
- Forks: 2
- Open Issues: 18
- Releases: 0
Topics
Metadata Files
README.md
vime-pytorch
This repo contains the PyTorch implementation of two Reinforcement Learning algorithms:
- PPO (Proximal Policy Optimization) (paper)
- VIME-PPO (Variational Information Maximizing Exploration) (paper)
The code makes use of openai/baselines.
The PPO implementation is mainly taken from ikostrikov/pytorch-a2c-ppo-acktr-gail.
The main novelty in this repository consists of the implementation of the VIME's exploration strategy using the PPO algorithm.
Requirements
- Python 3
- PyTorch
- OpenAI baselines
In order to install requirements, follow:
bash
pip install -r requirements.txt
If you don't have mujoco installed, follow the intructions here.
If having issues with OpenAI baselines, try:
```
Baselines for Atari preprocessing
git clone https://github.com/openai/baselines.git cd baselines pip install -e . ```
Instructions
In order to run InvertedDoublePendulum-v2 with VIME, you can use the following command:
bash
python main.py --env-name InvertedDoublePendulum-v2 --algo vime-ppo --use-gae --log-interval 1 --num-steps 2048 --num-processes 1 --lr 3e-4 --entropy-coef 0 --value-loss-coef 0.5 --ppo-epoch 10 --num-mini-batch 32 --gamma 0.99 --num-env-steps 1000000 --use-linear-lr-decay --no-cuda --log-dir /tmp/doublependulum/vimeppo/vimeppo-0 --seed 0 --use-proper-time-limits --eta 0.01
Instead, to run experiments with PPO, just replace vime-ppo with ppo.
Results
For standard gym environments, I used --eta 0.01.


For sparse gym environments, I used --eta 0.0001.


[the number in parenthesis represents how many experiments have been run]
Note:
Any gym-compatible environment can be run, but the hyperparameters have not been tested for all of them.
However, the parameters used with the InvertedDoublePendulum-v2 example in the Instructions are, generally, good enough for other mujoco environments.
TODO:
- Integrate more args into the command line
Owner
- Name: Pietro Mazzaglia
- Login: mazpie
- Kind: user
- Location: Ghent, Belgium
- Company: UGent
- Website: https://mazpie.github.io/
- Repositories: 1
- Profile: https://github.com/mazpie
Artificial Intelligence student
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Mazzaglia" given-names: "Pietro" orcid: "https://orcid.org/0000-0003-3319-5986" title: "VIME: Variational Information Maximizing Exploration, implementation in PyTorch" date-released: 2020-09-03 url: "https://github.com/mazpie/vime-pytorch"
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2
Dependencies
- Cython ==0.29.20
- Jinja2 ==2.11.2
- Keras-Applications ==1.0.8
- Keras-Preprocessing ==1.1.2
- Markdown ==3.2.2
- MarkupSafe ==1.1.1
- Pillow ==7.1.2
- Pygments ==2.6.1
- QtPy ==1.9.0
- Send2Trash ==1.5.0
- Werkzeug ==1.0.1
- absl-py ==0.9.0
- astor ==0.8.1
- atari-py ==0.2.6
- attrs ==19.3.0
- backcall ==0.2.0
- bleach ==3.1.4
- cffi ==1.14.0
- click ==7.1.2
- cloudpickle ==1.2.0
- cycler ==0.10.0
- decorator ==4.4.2
- defusedxml ==0.6.0
- entrypoints ==0.3
- enum34 ==1.1.10
- fasteners ==0.15
- future ==0.18.2
- gast ==0.2.2
- glfw ==1.11.2
- google-pasta ==0.2.0
- grpcio ==1.30.0
- gym ==0.15.7
- h5py ==2.10.0
- html5lib ==0.9999999
- imageio ==2.8.0
- importlib-metadata ==1.6.1
- ipykernel ==5.3.0
- ipython ==7.16.0
- ipython-genutils ==0.2.0
- ipywidgets ==7.5.1
- jedi ==0.17.1
- joblib ==0.15.1
- jsonschema ==3.2.0
- jupyter ==1.0.0
- jupyter-client ==6.1.3
- jupyter-console ==6.1.0
- jupyter-core ==4.6.3
- kiwisolver ==1.2.0
- matplotlib ==3.2.2
- mistune ==0.8.4
- monotonic ==1.5
- mujoco-py ==2.0.2.10
- nbconvert ==5.6.1
- nbformat ==5.0.7
- notebook ==6.0.3
- numpy ==1.19.0
- opencv-python ==4.2.0.34
- opt-einsum ==3.2.1
- pandas ==1.0.5
- pandocfilters ==1.4.2
- parso ==0.7.0
- pexpect ==4.8.0
- pickleshare ==0.7.5
- prometheus-client ==0.8.0
- prompt-toolkit ==3.0.5
- protobuf ==3.12.2
- ptyprocess ==0.6.0
- pybullet ==2.8.2
- pycparser ==2.20
- pyglet ==1.5.0
- pyparsing ==2.4.7
- pyrsistent ==0.16.0
- python-dateutil ==2.8.1
- pytz ==2020.1
- pyzmq ==19.0.1
- qtconsole ==4.7.5
- scipy ==1.5.0
- setuptools ==41.0.0
- six ==1.15.0
- tensorboard ==1.14.0
- tensorflow ==1.14.0
- tensorflow-estimator ==1.14.0
- tensorflow-gpu ==1.14.0
- tensorflow-tensorboard ==0.4.0
- termcolor ==1.1.0
- terminado ==0.8.3
- testpath ==0.4.4
- torch ==1.5.1
- torchvision ==0.6.1
- tornado ==6.0.4
- tqdm ==4.46.1
- traitlets ==4.3.3
- wcwidth ==0.2.5
- webencodings ==0.5.1
- widgetsnbextension ==3.5.1
- wrapt ==1.12.1
- zipp ==3.1.0