Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: shyamal-anadkat
- License: mit
- Language: Python
- Default Branch: master
- Size: 44.5 MB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 3
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Deep Reinforcement Learning
Shyamal H Anadkat | Fall '21
Background
Hello! This is a repository for AIPI530 DeepRL final project. The goal is to build a pipeline for offline RL. The starter code has been forked from d3rlpy (see citation at the bottom) Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data.
Before diving in, I would recommend getting familiarized with basic Reinforcement Learning. Here is a link to my blog post on Reinforcement Learning to get you started: RL Primer
The blog post briefly covers the following:
- What is reinforcement learning ?
- What are the pros and cons of reinforcement learning ?
- When should we consider applying reinforcement learning (and when should not) ?
- What's the difference between supervised learning and reinforcement learning ?
- What is offline reinforcement learning ? What are the pros and cons of offline reinforcement learning ?
- When should we consider applying offline reinforcement learning (and when should not) ?
- Have an example of offline reinforcement learning in the real-world
source: https://bair.berkeley.edu/blog/2020/12/07/offline/
Getting Started
(please read carefully)
This project is customized to training CQL on a custom dataset in d3rlpy, and training OPE (FQE) to evaluate the trained policy. Important scripts:
cql_train.py: at the root of the project is the main script, used to train cql & get evaluation scoresplot_helper.py: utility script to help produce the plots required
How do I install & run this project ?
1. Clone this repository
git clone https://github.com/shyamal-anadkat/offlinerl
2. Install **pybullet from source:**
pip install git+https://github.com/takuseno/d4rl-pybullet
3. Install requirements:
pip install Cython numpy
pip install -e .
Execute **
cql_train.pyfound at the root of the project**- Default dataset is
hopper-bullet-mixed-v0 - Default no. of
epochsis10. You can change this via custom args--epochs_cql&--epochs_fqe - For example if we want to run for 10 epochs:
python cql_train.py --epochs_cql 10 --epochs_fqe 10(see colab example below for more clarity)
- Default dataset is
Important Logs:
- Estimated Q values vs training steps (CQL):
d3rlpy_logs/CQL_hopper-bullet-mixed-v0_1/init_value.csv - Average reward vs training steps (CQL):
d3rlpy_logs/CQL_hopper-bullet-mixed-v0_1/environment.csv - True Q values vs training steps (CQL):
d3rlpy_logs/CQL_hopper-bullet-mixed-v0_1/true_q_value.csv - True Q & Estimated Q values vs training steps (FQE):
d3rlpy_logs/FQE_hopper-bullet-mixed-v0_1/.. - Note: I created my own scorer to calculate the true q values. See
scorer.py(true_q_value_scorer) for implementation details)
- Estimated Q values vs training steps (CQL):
For plotting, I wrote a utility script (at root of the project) which can be executed like so
python plot_helper.pyNote: you can provide arguments that correspond to the path to the logs or it will use the default.
- If you're curious here's the benchmark/reproduction
Other scripts:
- Format:
./scripts/format - Linting:
./scripts/lint
Sample Plots (with 100 epochs):
Note: logs can be found in /d3rlpy_logs
Examples speak more: 
Walkthrough:

Background on d3rlpy
d3rlpy is an offline deep reinforcement learning library for practitioners and researchers.
- Documentation: https://d3rlpy.readthedocs.io
- Paper: https://arxiv.org/abs/2111.03788
How do I install d3rlpy?
d3rlpy supports Linux, macOS and Windows. d3rlpy is not only easy, but also completely compatible with scikit-learn API, which means that you can maximize your productivity with the useful scikit-learn's utilities.
PyPI (recommended)
$ pip install d3rlpy
More examples around d3rlpy usage
```py import d3rlpy
dataset, env = d3rlpy.datasets.get_dataset("hopper-medium-v0")
prepare algorithm
sac = d3rlpy.algos.SAC()
train offline
sac.fit(dataset, n_steps=1000000)
train online
sac.fitonline(env, nsteps=1000000)
ready to control
actions = sac.predict(x) ```
MuJoCo
```py import d3rlpy
prepare dataset
dataset, env = d3rlpy.datasets.get_d4rl('hopper-medium-v0')
prepare algorithm
cql = d3rlpy.algos.CQL(use_gpu=True)
train
cql.fit(dataset, evalepisodes=dataset, nepochs=100, scorers={ 'environment': d3rlpy.metrics.evaluateonenvironment(env), 'tderror': d3rlpy.metrics.tderror_scorer }) ```
See more datasets at d4rl.
Atari 2600
```py import d3rlpy from sklearn.modelselection import traintest_split
prepare dataset
dataset, env = d3rlpy.datasets.get_atari('breakout-expert-v0')
split dataset
trainepisodes, testepisodes = traintestsplit(dataset, test_size=0.1)
prepare algorithm
cql = d3rlpy.algos.DiscreteCQL(nframes=4, qfuncfactory='qr', scaler='pixel', usegpu=True)
start training
cql.fit(trainepisodes, evalepisodes=testepisodes, nepochs=100, scorers={ 'environment': d3rlpy.metrics.evaluateonenvironment(env), 'tderror': d3rlpy.metrics.tderror_scorer }) ```
See more Atari datasets at d4rl-atari.
PyBullet
```py import d3rlpy
prepare dataset
dataset, env = d3rlpy.datasets.get_pybullet('hopper-bullet-mixed-v0')
prepare algorithm
cql = d3rlpy.algos.CQL(use_gpu=True)
start training
cql.fit(dataset, evalepisodes=dataset, nepochs=100, scorers={ 'environment': d3rlpy.metrics.evaluateonenvironment(env), 'tderror': d3rlpy.metrics.tderror_scorer }) ```
See more PyBullet datasets at d4rl-pybullet.
How about some tutorials?
Try a cartpole example on Google Colaboratory:
Citation
Thanks to Takuma Seno and his work on d3rlpy This wouldn't have been possible without it.
Seno, T., & Imai, M. (2021). d3rlpy: An Offline Deep Reinforcement Learning Library Conference paper. 35th Conference on Neural Information Processing Systems, Offline Reinforcement Learning Workshop, 2021
@InProceedings{seno2021d3rlpy,
author = {Takuma Seno, Michita Imai},
title = {d3rlpy: An Offline Deep Reinforcement Library},
booktitle = {NeurIPS 2021 Offline Reinforcement Learning Workshop},
month = {December},
year = {2021}
}
Owner
- Name: Shyamal H Anadkat
- Login: shyamal-anadkat
- Kind: user
- Location: San Francisco Bay Area
- Company: OpenAI
- Website: shyamal.me
- Repositories: 104
- Profile: https://github.com/shyamal-anadkat
building experiences that impact lives
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Seno"
given-names: "Takuma"
title: "d3rlpy: An offline deep reinforcement learning library"
version: 0.91
date-released: 2020-08-01
url: "https://github.com/takuseno/d3rlpy"
preferred-citation:
type: conference-paper
authors:
- family-names: "Seno"
given-names: "Takuma"
- family-names: "Imai"
given-names: "Michita"
journal: "NeurIPS 2021 Offline Reinforcement Learning Workshop"
conference:
name: "NeurIPS 2021 Offline Reinforcement Learning Workshop"
collection-title: "35th Conference on Neural Information Processing Systems, Offline Reinforcement Learning Workshop, 2021"
month: 12
title: "d3rlpy: An Offline Deep Reinforcement Learning Library"
year: 2021
GitHub Events
Total
- Issues event: 1
Last Year
- Issues event: 1
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 1
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- sonatype-depshield[bot] (1)
source: https://bair.berkeley.edu/blog/2020/12/07/offline/