https://github.com/araffin/srl-zoo

State Representation Learning (SRL) zoo with PyTorch - Part of S-RL Toolbox

https://github.com/araffin/srl-zoo

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary

Keywords

autoencoder deep-learning forward-model inverse-model neural-network pytorch reinforcement-learning representation-learning srl state-representation-learning vae
Last synced: 5 months ago · JSON representation

Repository

State Representation Learning (SRL) zoo with PyTorch - Part of S-RL Toolbox

Basic Info
Statistics
  • Stars: 163
  • Watchers: 18
  • Forks: 18
  • Open Issues: 1
  • Releases: 0
Topics
autoencoder deep-learning forward-model inverse-model neural-network pytorch reinforcement-learning representation-learning srl state-representation-learning vae
Created over 8 years ago · Last pushed over 6 years ago
Metadata Files
Readme License

README.md

State Representation Learning Zoo with PyTorch (part of S-RL Toolbox)

A collection of State Representation Learning (SRL) methods for Reinforcement Learning, written using PyTorch.

SRL Zoo Documentation: https://srl-zoo.readthedocs.io/

S-RL Toolbox Documentation: https://s-rl-toolbox.readthedocs.io/

S-RL Toolbox Repository: https://github.com/araffin/robotics-rl-srl

Available methods:

  • Autoencoder (reconstruction loss)
  • Denoising Autoencoder (DAE)
  • Forward Dynamics model
  • Inverse Dynamics model
  • Reward prediction loss
  • Variational Autoencoder (VAE) and beta-VAE
  • SRL with Robotic Priors + extensions (stereovision, additional priors)
  • Supervised Learning
  • Principal Component Analysis (PCA)
  • Triplet Network (for stereovision only)
  • Combination and stacking of methods
  • Random Features
  • [experimental] Reward Prior, Episode-prior, Perceptual Similarity loss (DARLA), Mutual Information loss

Related papers: - "Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics" (Raffin et al. 2018) https://arxiv.org/abs/1901.08651 - "S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning" (Raffin et al., 2018) https://arxiv.org/abs/1809.09369 - "State Representation Learning for Control: An Overview" (Lesort et al., 2018), link: https://arxiv.org/pdf/1802.04181.pdf

Documentation

Documentation is available online: https://srl-zoo.readthedocs.io/

Installation

Please read the documentation for more details, we provide anaconda env files and docker images.

Learning a State Representation

To learn a state representation, you need to enforce constrains on the representation using one or more losses. For example, to train an autoencoder, you need to use a reconstruction loss. Most losses are not exclusive, that means you can combine them.

All losses are defined in losses/losses.py. The available losses are:

  • autoencoder: reconstruction loss, using current and next observation
  • denoising autoencoder (dae): same as for the auto-encoder, except that the model reconstruct inputs from noisy observations containing a random zero-pixel mask
  • vae: (beta)-VAE loss (reconstruction + kullback leiber divergence loss)
  • inverse: predict the action given current and next state
  • forward: predict the next state given current state and taken action
  • reward: predict the reward (positive or not) given current and next state
  • priors: robotic priors losses (see "Learning State Representations with Robotic Priors")
  • triplet: triplet loss for multi-cam setting (see Multiple Cameras section in the doc)

[Experimental] - reward-prior: Maximises the correlation between states and rewards (does not make sense for sparse reward) - episode-prior: Learn an episode-agnostic state space, thanks to a discriminator distinguishing states from same/different episodes - perceptual similarity loss (for VAE): Instead of the reconstruction loss in the beta-VAE loss, it uses the distance between the reconstructed input and real input in the embedding of a pre-trained DAE. - mutual information loss: Maximises the mutual information between states and rewards

All possible arguments can be display using python train.py --help. You can limit the training set size (--training-set-size argument), change the minibatch size (-bs), number of epochs (--epochs), ...

Datasets: Simulated Environments and Real Robots

Although the data can be generated easily using the RL repo in simulation (cf Generating Data), we provide datasets with a real baxter:

Examples

You can download an example dataset here.

Train an inverse model: python train.py --data-folder data/path/to/dataset --losses inverse

Train an autoencoder: python train.py --data-folder data/path/to/dataset --losses autoencoder

Combining an autoencoder with an inverse model is as easy as: python train.py --data-folder data/path/to/dataset --losses autoencoder inverse

You can as well specify the weight of each loss: python train.py --data-folder data/path/to/dataset --losses autoencoder:1 inverse:10

Please read the documentation for more examples.

Running Tests

Download the test datasets kukagymtest and kukagymdual_test and put it in data/ folder. ./run_tests.sh

Troubleshooting

CUDA out of memory error

  1. python train.py --data-folder data/staticButtonSimplest RuntimeError: cuda runtime error (2) : out of memory at /b/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu:66

SOLUTION 1: Decrease the batch size, e.g. 32-64 in GPUs with little memory.

SOLUTION 2 Use simple 2-layers neural network model python train.py --data-folder data/staticButtonSimplest --model-type mlp

Owner

  • Name: Antonin RAFFIN
  • Login: araffin
  • Kind: user
  • Location: Munich
  • Company: @DLR-RM

Research Engineer in Robotics and Machine Learning, with a focus on Reinforcement Learning.

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 487
  • Total Committers: 10
  • Avg Commits per committer: 48.7
  • Development Distribution Score (DDS): 0.407
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Antonin RAFFIN a****n@e****r 289
kalifou r****o@g****m 63
ashley a****3@g****m 53
kalifou k****e@e****r 42
NataliaDiaz d****a@g****m 21
kalifou k****e@p****m 8
Kalifou René TR k****u 5
hill-a h****a 3
Natalia Diaz Rodriguez N****z 2
Gaspard QIN g****n@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 4
  • Total pull requests: 22
  • Average time to close issues: 3 months
  • Average time to close pull requests: 17 days
  • Total issue authors: 3
  • Total pull request authors: 6
  • Average comments per issue: 2.5
  • Average comments per pull request: 1.91
  • Merged pull requests: 20
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • araffin (2)
  • hill-a (1)
  • el-cangrejo (1)
Pull Request Authors
  • araffin (12)
  • hill-a (5)
  • kalifou (2)
  • ncble (1)
  • NataliaDiaz (1)
  • GaspardQin (1)
Top Labels
Issue Labels
question (1) bug (1) enhancement (1)
Pull Request Labels
enhancement (4)

Dependencies

environment.yml pypi
  • backports-abc ==0.5
  • backports.functools-lru-cache ==1.4
  • backports.shutil-get-terminal-size ==1.0.0
  • backports.ssl-match-hostname ==3.5.0.1
  • ipython-genutils ==0.2.0
  • pandas ==0.21.0
  • prompt-toolkit ==1.0.15
  • pytest ==3.5.0
  • pytest-cov ==2.5.1
  • pyzmq ==16.0.2
  • seaborn ==0.8.1
  • termcolor ==1.1.0
  • tqdm ==4.19.4