rlor

Reinforcement learning for operation research problems with OpenAI Gym and CleanRL

https://github.com/cpwan/rlor

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary

Keywords

attention cvrp operation-research ppo pytorch reinforcement-learning tsp
Last synced: 6 months ago · JSON representation ·

Repository

Reinforcement learning for operation research problems with OpenAI Gym and CleanRL

Basic Info
Statistics
  • Stars: 98
  • Watchers: 3
  • Forks: 9
  • Open Issues: 1
  • Releases: 0
Topics
attention cvrp operation-research ppo pytorch reinforcement-learning tsp
Created almost 3 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License Citation

README.md

RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research

:one: First work to incorporate end-to-end vehicle routing model in a modern RL platform (CleanRL)

:zap: Speed up the training of Attention Model by 8 times (25hours $\to$ 3 hours)

:mag_right: A flexible framework for developing model, algorithm, environment, and search for operation research

News

  • 13/04/2023: We release web demo on Hugging Face 🤗!
  • 24/03/2023: We release our paper on arxiv!
  • 20/03/2023: We release jupyter lab demo and pretrained checkpoints!
  • 10/03/2023: We release our codebase!

Demo

We provide inference demo on colab notebook:

| Environment | Search | Demo | | ----------- | ------------ | ------------------------------------------------------------ | | TSP | Greedy |
Open In Colab
| | CVRP | Multi-Greedy |
Open In Colab
|

Installation

Conda

```shell conda env create -n -f environment.yml

The environment.yml was generated from

conda env export --no-builds > environment.yml

``` It can take a few minutes.

Optional dependency

wandb

Refer to their quick start guide for installation.

File structures

All the major implementations were under rlor folder. shell ./rlor ├── envs │ ├── tsp_data.py # load pre-generated data for evaluation │ ├── tsp_vector_env.py # define the (vectorized) gym environment │ ├── cvrp_data.py │ └── cvrp_vector_env.py ├── models │ ├── attention_model_wrapper.py # wrap refactored attention model to cleanRL │ └── nets # contains refactored attention model └── ppo_or.py # implementaion of ppo with attention model for operation research problems

The ppo_or.py was modified from cleanrl/ppo.py. To see what's changed, use diff: ```shell

apt install diff

diff --color ppo.py ppo_or.py ```

Training OR model with PPO

TSP

shell python ppo_or.py --num-steps 51 --env-id tsp-v0 --env-entry-point envs.tsp_vector_env:TSPVectorEnv --problem tsp

CVRP

shell python ppo_or.py --num-steps 60 --env-id cvrp-v0 --env-entry-point envs.cvrp_vector_env:CVRPVectorEnv --problem cvrp

Enable WandB

shell python ppo_or.py ... --track Add --track argument to enable tracking with WandB.

Where is the tsp data?

It can be generated from the official repo of the attention-learn-to-route paper. You may modify the ./envs/tsp_data.py to update the path to data accordingly.

Acknowledgements

The neural network model is refactored and developed from Attention, Learn to Solve Routing Problems!.

The idea of multiple trajectory training/ inference is from POMO: Policy Optimization with Multiple Optima for Reinforcement Learning.

The RL environments are defined with OpenAI Gym.

The PPO algorithm implementation is based on CleanRL.

Owner

  • Login: cpwan
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "WAN"
  given-names: "Ching Pui"
  orcid: "https://orcid.org/0000-0002-6217-5418"
- family-names: "LI"
  given-names: "Tung"
- family-names: "WANG"
  given-names: "Jason Min"
title: "RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research"
version: 1.0.0
doi: 10.5281/zenodo.1234
date-released: 2023-03-23
url: "https://github.com/cpwan/RLOR"
preferred-citation:
  type: misc
  authors:
  - family-names: "WAN"
    given-names: "Ching Pui"
    orcid: "https://orcid.org/0000-0002-6217-5418"
  - family-names: "LI"
    given-names: "Tung"
  - family-names: "WANG"
    given-names: "Jason Min"
  doi: 10.48550/arXiv.2303.13117
  title: "RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research"
  year: 2023
  eprint: "arXiv:2303.13117"
  url : "http://arxiv.org/abs/2303.13117"

GitHub Events

Total
  • Issues event: 2
  • Watch event: 30
  • Issue comment event: 1
  • Fork event: 3
Last Year
  • Issues event: 2
  • Watch event: 30
  • Issue comment event: 1
  • Fork event: 3

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 10
  • Total Committers: 3
  • Avg Commits per committer: 3.333
  • Development Distribution Score (DDS): 0.2
Past Year
  • Commits: 10
  • Committers: 3
  • Avg Commits per committer: 3.333
  • Development Distribution Score (DDS): 0.2
Top Committers
Name Email Commits
cpwan c****n@c****k 8
Patrick WAN c****5@l****m 1
TonyLiHK 1****K 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: about 2 years ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • WaitDumplings (1)
Pull Request Authors
  • cpwan (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

environment.yml pypi
  • absl-py ==1.3.0
  • cachetools ==5.2.0
  • certifi ==2022.9.24
  • cfgv ==3.3.1
  • charset-normalizer ==2.1.1
  • cloudpickle ==2.2.0
  • distlib ==0.3.6
  • filelock ==3.8.0
  • google-auth ==2.14.1
  • google-auth-oauthlib ==0.4.6
  • grpcio ==1.50.0
  • gym ==0.23.1
  • gym-notices ==0.0.8
  • identify ==2.5.8
  • idna ==3.4
  • importlib-metadata ==5.0.0
  • llvmlite ==0.39.1
  • markdown ==3.4.1
  • markupsafe ==2.1.1
  • nodeenv ==1.7.0
  • numba ==0.56.4
  • numpy ==1.23.4
  • nvidia-cublas-cu11 ==11.10.3.66
  • nvidia-cuda-nvrtc-cu11 ==11.7.99
  • nvidia-cuda-runtime-cu11 ==11.7.99
  • nvidia-cudnn-cu11 ==8.5.0.96
  • oauthlib ==3.2.2
  • pillow ==9.3.0
  • platformdirs ==2.5.3
  • pre-commit ==2.20.0
  • protobuf ==3.20.3
  • pyasn1 ==0.4.8
  • pyasn1-modules ==0.2.8
  • pygame ==2.1.0
  • pyyaml ==6.0
  • requests ==2.28.1
  • requests-oauthlib ==1.3.1
  • rsa ==4.9
  • tensorboard ==2.11.0
  • tensorboard-data-server ==0.6.1
  • tensorboard-plugin-wit ==1.8.1
  • toml ==0.10.2
  • torch ==1.13.0
  • torchvision ==0.14.0
  • typing-extensions ==4.4.0
  • urllib3 ==1.26.12
  • virtualenv ==20.16.6
  • werkzeug ==2.2.2
  • zipp ==3.10.0