Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: shyamal-anadkat
  • License: mit
  • Language: Python
  • Default Branch: master
  • Size: 44.5 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 3
  • Open Issues: 0
  • Releases: 0
Created over 4 years ago · Last pushed over 4 years ago
Metadata Files
Readme License Citation

README.md

Deep Reinforcement Learning

Shyamal H Anadkat | Fall '21

Background

Hello! This is a repository for AIPI530 DeepRL final project. The goal is to build a pipeline for offline RL. The starter code has been forked from d3rlpy (see citation at the bottom) Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data.

Before diving in, I would recommend getting familiarized with basic Reinforcement Learning. Here is a link to my blog post on Reinforcement Learning to get you started: RL Primer

The blog post briefly covers the following:

  • What is reinforcement learning ?
  • What are the pros and cons of reinforcement learning ?
  • When should we consider applying reinforcement learning (and when should not) ?
  • What's the difference between supervised learning and reinforcement learning ?
  • What is offline reinforcement learning ? What are the pros and cons of offline reinforcement learning ?
  • When should we consider applying offline reinforcement learning (and when should not) ?
  • Have an example of offline reinforcement learning in the real-world

img.png source: https://bair.berkeley.edu/blog/2020/12/07/offline/

Getting Started

(please read carefully)

This project is customized to training CQL on a custom dataset in d3rlpy, and training OPE (FQE) to evaluate the trained policy. Important scripts:

  1. cql_train.py: at the root of the project is the main script, used to train cql & get evaluation scores
  2. plot_helper.py: utility script to help produce the plots required

How do I install & run this project ?


1. Clone this repository git clone https://github.com/shyamal-anadkat/offlinerl

2. Install **pybullet from source:** pip install git+https://github.com/takuseno/d4rl-pybullet

3. Install requirements: pip install Cython numpy pip install -e .

  1. Execute **cql_train.py found at the root of the project**

    • Default dataset is hopper-bullet-mixed-v0
    • Default no. of epochs is 10. You can change this via custom args --epochs_cql & --epochs_fqe
    • For example if we want to run for 10 epochs: python cql_train.py --epochs_cql 10 --epochs_fqe 10 (see colab example below for more clarity)
  2. Important Logs:

    • Estimated Q values vs training steps (CQL): d3rlpy_logs/CQL_hopper-bullet-mixed-v0_1/init_value.csv
    • Average reward vs training steps (CQL): d3rlpy_logs/CQL_hopper-bullet-mixed-v0_1/environment.csv
    • True Q values vs training steps (CQL): d3rlpy_logs/CQL_hopper-bullet-mixed-v0_1/true_q_value.csv
    • True Q & Estimated Q values vs training steps (FQE): d3rlpy_logs/FQE_hopper-bullet-mixed-v0_1/..
    • Note: I created my own scorer to calculate the true q values. See scorer.py (true_q_value_scorer) for implementation details)
  3. For plotting, I wrote a utility script (at root of the project) which can be executed like so python plot_helper.py Note: you can provide arguments that correspond to the path to the logs or it will use the default.

Other scripts:

  • Format: ./scripts/format
  • Linting: ./scripts/lint

Sample Plots (with 100 epochs):

img.png

Note: logs can be found in /d3rlpy_logs

Examples speak more: Open In Colab

Walkthrough: walkthrough.gif


Background on d3rlpy

d3rlpy is an offline deep reinforcement learning library for practitioners and researchers.

  • Documentation: https://d3rlpy.readthedocs.io
  • Paper: https://arxiv.org/abs/2111.03788

How do I install d3rlpy?

d3rlpy supports Linux, macOS and Windows. d3rlpy is not only easy, but also completely compatible with scikit-learn API, which means that you can maximize your productivity with the useful scikit-learn's utilities.

PyPI (recommended)

PyPI version PyPI - Downloads

$ pip install d3rlpy

More examples around d3rlpy usage

```py import d3rlpy

dataset, env = d3rlpy.datasets.get_dataset("hopper-medium-v0")

prepare algorithm

sac = d3rlpy.algos.SAC()

train offline

sac.fit(dataset, n_steps=1000000)

train online

sac.fitonline(env, nsteps=1000000)

ready to control

actions = sac.predict(x) ```

MuJoCo

```py import d3rlpy

prepare dataset

dataset, env = d3rlpy.datasets.get_d4rl('hopper-medium-v0')

prepare algorithm

cql = d3rlpy.algos.CQL(use_gpu=True)

train

cql.fit(dataset, evalepisodes=dataset, nepochs=100, scorers={ 'environment': d3rlpy.metrics.evaluateonenvironment(env), 'tderror': d3rlpy.metrics.tderror_scorer }) ```

See more datasets at d4rl.

Atari 2600

```py import d3rlpy from sklearn.modelselection import traintest_split

prepare dataset

dataset, env = d3rlpy.datasets.get_atari('breakout-expert-v0')

split dataset

trainepisodes, testepisodes = traintestsplit(dataset, test_size=0.1)

prepare algorithm

cql = d3rlpy.algos.DiscreteCQL(nframes=4, qfuncfactory='qr', scaler='pixel', usegpu=True)

start training

cql.fit(trainepisodes, evalepisodes=testepisodes, nepochs=100, scorers={ 'environment': d3rlpy.metrics.evaluateonenvironment(env), 'tderror': d3rlpy.metrics.tderror_scorer }) ```

See more Atari datasets at d4rl-atari.

PyBullet

```py import d3rlpy

prepare dataset

dataset, env = d3rlpy.datasets.get_pybullet('hopper-bullet-mixed-v0')

prepare algorithm

cql = d3rlpy.algos.CQL(use_gpu=True)

start training

cql.fit(dataset, evalepisodes=dataset, nepochs=100, scorers={ 'environment': d3rlpy.metrics.evaluateonenvironment(env), 'tderror': d3rlpy.metrics.tderror_scorer }) ```

See more PyBullet datasets at d4rl-pybullet.

How about some tutorials?

Try a cartpole example on Google Colaboratory:

  • official offline RL tutorial: Open In Colab

Citation

Thanks to Takuma Seno and his work on d3rlpy This wouldn't have been possible without it.

Seno, T., & Imai, M. (2021). d3rlpy: An Offline Deep Reinforcement Learning Library Conference paper. 35th Conference on Neural Information Processing Systems, Offline Reinforcement Learning Workshop, 2021

@InProceedings{seno2021d3rlpy, author = {Takuma Seno, Michita Imai}, title = {d3rlpy: An Offline Deep Reinforcement Library}, booktitle = {NeurIPS 2021 Offline Reinforcement Learning Workshop}, month = {December}, year = {2021} }

Owner

  • Name: Shyamal H Anadkat
  • Login: shyamal-anadkat
  • Kind: user
  • Location: San Francisco Bay Area
  • Company: OpenAI

building experiences that impact lives

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Seno"
  given-names: "Takuma"
title: "d3rlpy: An offline deep reinforcement learning library"
version: 0.91
date-released: 2020-08-01
url: "https://github.com/takuseno/d3rlpy"
preferred-citation:
  type: conference-paper
  authors:
  - family-names: "Seno"
    given-names: "Takuma"
  - family-names: "Imai"
    given-names: "Michita"
  journal: "NeurIPS 2021 Offline Reinforcement Learning Workshop"
  conference:
    name: "NeurIPS 2021 Offline Reinforcement Learning Workshop"
  collection-title: "35th Conference on Neural Information Processing Systems, Offline Reinforcement Learning Workshop, 2021"
  month: 12
  title: "d3rlpy: An Offline Deep Reinforcement Learning Library"
  year: 2021

GitHub Events

Total
  • Issues event: 1
Last Year
  • Issues event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 1
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • sonatype-depshield[bot] (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels