offlinerl

https://github.com/shyamal-anadkat/offlinerl

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: shyamal-anadkat
License: mit
Language: Python
Default Branch: master
Size: 44.5 MB

Statistics

Stars: 2
Watchers: 1
Forks: 3
Open Issues: 0
Releases: 0

Created over 4 years ago · Last pushed over 4 years ago

Metadata Files

Readme License Citation

Deep Reinforcement Learning

Shyamal H Anadkat | Fall '21

Background

Hello! This is a repository for AIPI530 DeepRL final project. The goal is to build a pipeline for offline RL. The starter code has been forked from d3rlpy (see citation at the bottom) Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data.

Before diving in, I would recommend getting familiarized with basic Reinforcement Learning. Here is a link to my blog post on Reinforcement Learning to get you started: RL Primer

The blog post briefly covers the following:

What is reinforcement learning ?
What are the pros and cons of reinforcement learning ?
When should we consider applying reinforcement learning (and when should not) ?
What's the difference between supervised learning and reinforcement learning ?
What is offline reinforcement learning ? What are the pros and cons of offline reinforcement learning ?
When should we consider applying offline reinforcement learning (and when should not) ?
Have an example of offline reinforcement learning in the real-world

source: https://bair.berkeley.edu/blog/2020/12/07/offline/

Getting Started

(please read carefully)

This project is customized to training CQL on a custom dataset in d3rlpy, and training OPE (FQE) to evaluate the trained policy. Important scripts:

cql_train.py: at the root of the project is the main script, used to train cql & get evaluation scores
plot_helper.py: utility script to help produce the plots required

How do I install & run this project ?

1. Clone this repository git clone https://github.com/shyamal-anadkat/offlinerl

2. Install **pybullet from source:** pip install git+https://github.com/takuseno/d4rl-pybullet

3. Install requirements: pip install Cython numpy pip install -e .

Execute **cql_train.py found at the root of the project**
- Default dataset is hopper-bullet-mixed-v0
- Default no. of epochs is 10. You can change this via custom args --epochs_cql & --epochs_fqe
- For example if we want to run for 10 epochs: python cql_train.py --epochs_cql 10 --epochs_fqe 10 (see colab example below for more clarity)
Important Logs:
- Estimated Q values vs training steps (CQL): d3rlpy_logs/CQL_hopper-bullet-mixed-v0_1/init_value.csv
- Average reward vs training steps (CQL): d3rlpy_logs/CQL_hopper-bullet-mixed-v0_1/environment.csv
- True Q values vs training steps (CQL): d3rlpy_logs/CQL_hopper-bullet-mixed-v0_1/true_q_value.csv
- True Q & Estimated Q values vs training steps (FQE): d3rlpy_logs/FQE_hopper-bullet-mixed-v0_1/..
- Note: I created my own scorer to calculate the true q values. See scorer.py (true_q_value_scorer) for implementation details)
For plotting, I wrote a utility script (at root of the project) which can be executed like so python plot_helper.py Note: you can provide arguments that correspond to the path to the logs or it will use the default.

If you're curious here's the benchmark/reproduction

Other scripts:

Format: ./scripts/format
Linting: ./scripts/lint

Sample Plots (with 100 epochs):

Note: logs can be found in /d3rlpy_logs

Examples speak more:

Walkthrough:

Background on d3rlpy

d3rlpy is an offline deep reinforcement learning library for practitioners and researchers.

Documentation: https://d3rlpy.readthedocs.io
Paper: https://arxiv.org/abs/2111.03788

How do I install d3rlpy?

d3rlpy supports Linux, macOS and Windows. d3rlpy is not only easy, but also completely compatible with scikit-learn API, which means that you can maximize your productivity with the useful scikit-learn's utilities.

PyPI (recommended)

PyPI - Downloads

$ pip install d3rlpy

More examples around d3rlpy usage

```py import d3rlpy

dataset, env = d3rlpy.datasets.get_dataset("hopper-medium-v0")

prepare algorithm

sac = d3rlpy.algos.SAC()

train offline

sac.fit(dataset, n_steps=1000000)

train online

sac.fitonline(env, nsteps=1000000)

ready to control

actions = sac.predict(x) ```

MuJoCo

```py import d3rlpy

prepare dataset

dataset, env = d3rlpy.datasets.get_d4rl('hopper-medium-v0')

prepare algorithm

cql = d3rlpy.algos.CQL(use_gpu=True)

train

cql.fit(dataset, evalepisodes=dataset, nepochs=100, scorers={ 'environment': d3rlpy.metrics.evaluateonenvironment(env), 'tderror': d3rlpy.metrics.tderror_scorer }) ```

See more datasets at d4rl.

Atari 2600

```py import d3rlpy from sklearn.modelselection import traintest_split

prepare dataset

dataset, env = d3rlpy.datasets.get_atari('breakout-expert-v0')

split dataset

trainepisodes, testepisodes = traintestsplit(dataset, test_size=0.1)

prepare algorithm

cql = d3rlpy.algos.DiscreteCQL(nframes=4, qfuncfactory='qr', scaler='pixel', usegpu=True)

start training

cql.fit(trainepisodes, evalepisodes=testepisodes, nepochs=100, scorers={ 'environment': d3rlpy.metrics.evaluateonenvironment(env), 'tderror': d3rlpy.metrics.tderror_scorer }) ```

See more Atari datasets at d4rl-atari.

PyBullet

```py import d3rlpy

prepare dataset

dataset, env = d3rlpy.datasets.get_pybullet('hopper-bullet-mixed-v0')

prepare algorithm

cql = d3rlpy.algos.CQL(use_gpu=True)

start training

cql.fit(dataset, evalepisodes=dataset, nepochs=100, scorers={ 'environment': d3rlpy.metrics.evaluateonenvironment(env), 'tderror': d3rlpy.metrics.tderror_scorer }) ```

See more PyBullet datasets at d4rl-pybullet.

How about some tutorials?

Try a cartpole example on Google Colaboratory:

official offline RL tutorial:

Citation

Thanks to Takuma Seno and his work on d3rlpy This wouldn't have been possible without it.

Seno, T., & Imai, M. (2021). d3rlpy: An Offline Deep Reinforcement Learning Library Conference paper. 35th Conference on Neural Information Processing Systems, Offline Reinforcement Learning Workshop, 2021

@InProceedings{seno2021d3rlpy, author = {Takuma Seno, Michita Imai}, title = {d3rlpy: An Offline Deep Reinforcement Library}, booktitle = {NeurIPS 2021 Offline Reinforcement Learning Workshop}, month = {December}, year = {2021} }

Owner

Name: Shyamal H Anadkat
Login: shyamal-anadkat
Kind: user
Location: San Francisco Bay Area
Company: OpenAI

Website: shyamal.me
Repositories: 104
Profile: https://github.com/shyamal-anadkat

building experiences that impact lives

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Seno"
  given-names: "Takuma"
title: "d3rlpy: An offline deep reinforcement learning library"
version: 0.91
date-released: 2020-08-01
url: "https://github.com/takuseno/d3rlpy"
preferred-citation:
  type: conference-paper
  authors:
  - family-names: "Seno"
    given-names: "Takuma"
  - family-names: "Imai"
    given-names: "Michita"
  journal: "NeurIPS 2021 Offline Reinforcement Learning Workshop"
  conference:
    name: "NeurIPS 2021 Offline Reinforcement Learning Workshop"
  collection-title: "35th Conference on Neural Information Processing Systems, Offline Reinforcement Learning Workshop, 2021"
  month: 12
  title: "d3rlpy: An Offline Deep Reinforcement Learning Library"
  year: 2021

GitHub Events

Total

Issues event: 1

Last Year

Issues event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 1
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 1
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

offlinerl

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Deep Reinforcement Learning

Shyamal H Anadkat | Fall '21

Background

Getting Started

(please read carefully)

How do I install & run this project ?

Other scripts:

Sample Plots (with 100 epochs):

Examples speak more:

Background on d3rlpy

How do I install d3rlpy?

PyPI (recommended)

More examples around d3rlpy usage

prepare algorithm

train offline

train online

ready to control

MuJoCo

prepare dataset

prepare algorithm

train

Atari 2600

prepare dataset

split dataset

prepare algorithm

start training

PyBullet

prepare dataset

prepare algorithm

start training

How about some tutorials?

Citation

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels