open-rl

Implementations of a large collection of reinforcement learning algorithms.

https://github.com/natetsang/open-rl

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.4%) to scientific vocabulary

Keywords

artificial-intelligence deep-learning deep-reinforcement-learning gym-environments keras machine-learning neural-networks python reinforcement-learning tensorflow

Keywords from Contributors

mesh sequences interactive hacking network-simulation
Last synced: 6 months ago · JSON representation ·

Repository

Implementations of a large collection of reinforcement learning algorithms.

Basic Info
  • Host: GitHub
  • Owner: natetsang
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 260 MB
Statistics
  • Stars: 27
  • Watchers: 5
  • Forks: 1
  • Open Issues: 0
  • Releases: 2
Topics
artificial-intelligence deep-learning deep-reinforcement-learning gym-environments keras machine-learning neural-networks python reinforcement-learning tensorflow
Created over 4 years ago · Last pushed about 2 years ago
Metadata Files
Readme License Citation

README.md

Open-RL Logo

GitHub release (latest by date) GitHub commits since latest release (by date) PRs Welcome

Open RL is code repository that contains minimalistic implementations of a wide collection of reinforcement learning algorithms. The purpose of this repo is to make RL more approachable and easier to learn. As such, code in this repo is optimized for readability and consistency between algorithms.

Compared to machine learning, RL is still rather niche. As such, finding resources for learning RL is a bit more difficult. While implementations broadly exist for two algorithms, Q-networks and vanilla policy gradients, it's much more difficult to find easy-to-follow implementations of others. For many of the algorithms implemented here, no simple implementations appear to exist whatsoever. Interestingly, it's not just state-of-the-art algorithms that haven't been re-implemented in an easy-to-follow way. It's also hard to find clear implementations of foundational algorithms like multi-armed bandits. It's for these reasons why open-rl was created! Happy learning!

Algorithms

In this repo you will find implementations for the following algorithms.

Model free learning

Policy-based methods

| | Discrete | Continuous | | --- | :---: | :---: | | REINFORCE | :heavycheckmark: | :heavymultiplicationx: | | REINFORCE w/ baseline | :heavycheckmark: | :heavymultiplicationx: | | VPG | :heavycheckmark: | :heavycheckmark: |

Value-based methods

| | Discrete | Continuous | | --- | :---: | :---: | | DQN | :heavycheckmark: | :heavymultiplicationx: | | Double DQN | :heavycheckmark: | :heavymultiplicationx: | | Dueling DQN | :heavycheckmark: | :heavymultiplicationx: | | DRQN (for POMDPs) | :heavycheckmark: | :heavymultiplicationx: |

Actor-critic methods

| | Discrete | Continuous | | --- | :---: | :---: | | A2C | :heavycheckmark: | :heavymultiplicationx: | | A3C | :heavycheckmark: | :heavymultiplicationx: | | DDPG | :heavymultiplicationx: | :heavycheckmark: | | TD3 | :heavymultiplicationx: | :heavycheckmark: | | SAC | :heavymultiplicationx: | :heavycheckmark: | | PPO | :heavymultiplicationx: | :heavycheckmark: |

Bandits

Multi-armed bandits

| | Discrete | Continuous | | --- | :---: | :---: | | Pure Exploration | :heavycheckmark: | :heavymultiplicationx: | | Epsilon Greedy | :heavycheckmark: | :heavymultiplicationx: | | Thompson Sampling - Bernoulli | :heavycheckmark: | :heavymultiplicationx: | | Thompson Sampling - Gaussian | :heavycheckmark: | :heavymultiplicationx: | | Upper Confidence Bounds (UCB) | :heavycheckmark: | :heavymultiplicationx: |

Contextual bandits

| | Discrete | Continuous | | --- | :---: | :---: | | Linear UCB | :heavycheckmark: | :heavymultiplicationx: | | Linear Thompson Sampling | :heavymultiplicationx: | :heavymultiplicationx: | | Neural-network approach | :heavycheckmark: | :heavymultiplicationx: |

Model-based learning

| | Discrete | Continuous | | --- | :---: | :---: | | Dyna-Q | :heavycheckmark: | :heavymultiplicationx: | | Deep Dyna-Q | :heavycheckmark: | :heavymultiplicationx: | | Monte-Carlo Tree Search (MCTS) | :heavycheckmark: | :heavymultiplicationx: | | MB + Model Predictive Control | :heavymultiplicationx: | :heavycheckmark: | | Model-Based Policy Opitmization (MBPO)| :heavymultiplicationx: | :heavycheckmark: |

Offline (batch) learning

| | Discrete | Continuous | | --- | :---: | :---: | | Conservative Q-learning (CQL) | :heavycheckmark: | :heavymultiplicationx: | | Model-Based Offline Reinforcement Learning (MOReL) | :heavycheckmark: | :heavymultiplicationx: | | Model-Based Offline Policy Optimization (MOPO) | :heavymultiplicationx: | :heavycheckmark: |

Other

| | Discrete | Continuous | | --- | :---: | :---: | | Behavioral Cloning | :heavycheckmark: | :heavymultiplicationx: | | Imitation Learning | :heavycheckmark: | :heavymultiplicationx: |

Installation

  • Make sure you have Python 3.7 or higher installed
  • Clone the repo ``` # Clone repo from github git clone --depth 1 https://github.com/natetsang/open-rl

Navigate to root folder

cd open-rl - Create a virtual environment (Windows 10). Showing instructions from `virtualenv` but there are other options too!

If not already installed, you might need to run this next line

pip install virtualenv

Create virtual environment called 'venv' in the root of the project

virtualenv venv

Activate environment

venv\Scripts\activate - Download requirements pip install -r requirements.txt ```

Contributing

If you're interested in contributing to open-rl, please fork the repo and make a pull request. Any support is much appreciated!

Citation

If you use this code, please cite it as follows: @misc{Open-RL, author = {Tsang, Nate}, title = {{Open-RL: Minimalistic implementations of reinforcment learning algorithms}}, url = {https://github.com/natetsang/open-rl}, year = {2021} }

Acknowledgements

This repo would not be possible without the following (tremendous) resources, which were relied upon heavily when learning RL. I highly recommend going through these to learn more. * CS285 @ UC Berkeley - taught by Sergey Levine * Grokking Deep RL book by @mimoralea * More to be added soon!

Owner

  • Name: Nate T
  • Login: natetsang
  • Kind: user
  • Location: San Francisco, CA

software engineer & reinforcement learning researcher

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Tsang"
  given-names: "Nate"
title: "Open-RL: Minimalistic implementations of reinforcment learning algorithms"
version: 1.0.0
date-released: 2021-08-20
url: "https://github.com/natetsang/open-rl"
repository-code: "https://github.com/natetsang/open-rl"
license: "MIT"

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 111
  • Total Committers: 4
  • Avg Commits per committer: 27.75
  • Development Distribution Score (DDS): 0.477
Past Year
  • Commits: 48
  • Committers: 2
  • Avg Commits per committer: 24.0
  • Development Distribution Score (DDS): 0.333
Top Committers
Name Email Commits
Nate Tsang n****g@g****m 58
Nate T 3****g 36
Nathan Tsang n****g@r****m 16
dependabot[bot] 4****] 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: about 2 years ago

All Time
  • Total issues: 0
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: 5 minutes
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.5
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 2
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • dependabot[bot] (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (2)

Dependencies

requirements.txt pypi
  • gym ==0.20.0
  • matplotlib ==3.4.3
  • numpy ==1.21.2
  • tensorflow ==2.7.1
  • tensorflow-probability ==0.13.0