vime-pytorch

PyTorch implementation of the VIME paper (Variational Information Maximizing Exploration)

https://github.com/mazpie/vime-pytorch

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.1%) to scientific vocabulary

Keywords

ppo pytorch reinforcement-learning vime
Last synced: 6 months ago · JSON representation ·

Repository

PyTorch implementation of the VIME paper (Variational Information Maximizing Exploration)

Basic Info
  • Host: GitHub
  • Owner: mazpie
  • License: other
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 2.07 MB
Statistics
  • Stars: 7
  • Watchers: 2
  • Forks: 2
  • Open Issues: 18
  • Releases: 0
Topics
ppo pytorch reinforcement-learning vime
Created over 5 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License Citation

README.md

vime-pytorch

This repo contains the PyTorch implementation of two Reinforcement Learning algorithms:

  • PPO (Proximal Policy Optimization) (paper)
  • VIME-PPO (Variational Information Maximizing Exploration) (paper)

The code makes use of openai/baselines.

The PPO implementation is mainly taken from ikostrikov/pytorch-a2c-ppo-acktr-gail.

The main novelty in this repository consists of the implementation of the VIME's exploration strategy using the PPO algorithm.

Requirements

In order to install requirements, follow:

bash pip install -r requirements.txt

If you don't have mujoco installed, follow the intructions here.

If having issues with OpenAI baselines, try:

```

Baselines for Atari preprocessing

git clone https://github.com/openai/baselines.git cd baselines pip install -e . ```

Instructions

In order to run InvertedDoublePendulum-v2 with VIME, you can use the following command:

bash python main.py --env-name InvertedDoublePendulum-v2 --algo vime-ppo --use-gae --log-interval 1 --num-steps 2048 --num-processes 1 --lr 3e-4 --entropy-coef 0 --value-loss-coef 0.5 --ppo-epoch 10 --num-mini-batch 32 --gamma 0.99 --num-env-steps 1000000 --use-linear-lr-decay --no-cuda --log-dir /tmp/doublependulum/vimeppo/vimeppo-0 --seed 0 --use-proper-time-limits --eta 0.01

Instead, to run experiments with PPO, just replace vime-ppo with ppo.

Results

For standard gym environments, I used --eta 0.01.

MountainCar-v0

InvertedDoublePendulum-v2

For sparse gym environments, I used --eta 0.0001.

MountainCar-v0-Sparse

HalfCheetah-v3-Sparse

[the number in parenthesis represents how many experiments have been run]

Note:

Any gym-compatible environment can be run, but the hyperparameters have not been tested for all of them.

However, the parameters used with the InvertedDoublePendulum-v2 example in the Instructions are, generally, good enough for other mujoco environments.

TODO:

  • Integrate more args into the command line

Owner

  • Name: Pietro Mazzaglia
  • Login: mazpie
  • Kind: user
  • Location: Ghent, Belgium
  • Company: UGent

Artificial Intelligence student

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Mazzaglia"
  given-names: "Pietro"
  orcid: "https://orcid.org/0000-0003-3319-5986"
title: "VIME: Variational Information Maximizing Exploration, implementation in PyTorch"
date-released: 2020-09-03
url: "https://github.com/mazpie/vime-pytorch"

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Dependencies

requirements.txt pypi
  • Cython ==0.29.20
  • Jinja2 ==2.11.2
  • Keras-Applications ==1.0.8
  • Keras-Preprocessing ==1.1.2
  • Markdown ==3.2.2
  • MarkupSafe ==1.1.1
  • Pillow ==7.1.2
  • Pygments ==2.6.1
  • QtPy ==1.9.0
  • Send2Trash ==1.5.0
  • Werkzeug ==1.0.1
  • absl-py ==0.9.0
  • astor ==0.8.1
  • atari-py ==0.2.6
  • attrs ==19.3.0
  • backcall ==0.2.0
  • bleach ==3.1.4
  • cffi ==1.14.0
  • click ==7.1.2
  • cloudpickle ==1.2.0
  • cycler ==0.10.0
  • decorator ==4.4.2
  • defusedxml ==0.6.0
  • entrypoints ==0.3
  • enum34 ==1.1.10
  • fasteners ==0.15
  • future ==0.18.2
  • gast ==0.2.2
  • glfw ==1.11.2
  • google-pasta ==0.2.0
  • grpcio ==1.30.0
  • gym ==0.15.7
  • h5py ==2.10.0
  • html5lib ==0.9999999
  • imageio ==2.8.0
  • importlib-metadata ==1.6.1
  • ipykernel ==5.3.0
  • ipython ==7.16.0
  • ipython-genutils ==0.2.0
  • ipywidgets ==7.5.1
  • jedi ==0.17.1
  • joblib ==0.15.1
  • jsonschema ==3.2.0
  • jupyter ==1.0.0
  • jupyter-client ==6.1.3
  • jupyter-console ==6.1.0
  • jupyter-core ==4.6.3
  • kiwisolver ==1.2.0
  • matplotlib ==3.2.2
  • mistune ==0.8.4
  • monotonic ==1.5
  • mujoco-py ==2.0.2.10
  • nbconvert ==5.6.1
  • nbformat ==5.0.7
  • notebook ==6.0.3
  • numpy ==1.19.0
  • opencv-python ==4.2.0.34
  • opt-einsum ==3.2.1
  • pandas ==1.0.5
  • pandocfilters ==1.4.2
  • parso ==0.7.0
  • pexpect ==4.8.0
  • pickleshare ==0.7.5
  • prometheus-client ==0.8.0
  • prompt-toolkit ==3.0.5
  • protobuf ==3.12.2
  • ptyprocess ==0.6.0
  • pybullet ==2.8.2
  • pycparser ==2.20
  • pyglet ==1.5.0
  • pyparsing ==2.4.7
  • pyrsistent ==0.16.0
  • python-dateutil ==2.8.1
  • pytz ==2020.1
  • pyzmq ==19.0.1
  • qtconsole ==4.7.5
  • scipy ==1.5.0
  • setuptools ==41.0.0
  • six ==1.15.0
  • tensorboard ==1.14.0
  • tensorflow ==1.14.0
  • tensorflow-estimator ==1.14.0
  • tensorflow-gpu ==1.14.0
  • tensorflow-tensorboard ==0.4.0
  • termcolor ==1.1.0
  • terminado ==0.8.3
  • testpath ==0.4.4
  • torch ==1.5.1
  • torchvision ==0.6.1
  • tornado ==6.0.4
  • tqdm ==4.46.1
  • traitlets ==4.3.3
  • wcwidth ==0.2.5
  • webencodings ==0.5.1
  • widgetsnbextension ==3.5.1
  • wrapt ==1.12.1
  • zipp ==3.1.0