rlor

Reinforcement learning for operation research problems with OpenAI Gym and CleanRL

https://github.com/cpwan/rlor

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary

Keywords

attention cvrp operation-research ppo pytorch reinforcement-learning tsp

Last synced: 6 months ago · JSON representation ·

Repository

Reinforcement learning for operation research problems with OpenAI Gym and CleanRL

Basic Info

Host: GitHub
Owner: cpwan
License: other
Language: Python
Default Branch: main
Homepage: https://cpwan.github.io/RLOR/
Size: 5.09 MB

Statistics

Stars: 98
Watchers: 3
Forks: 9
Open Issues: 1
Releases: 0

Topics

attention cvrp operation-research ppo pytorch reinforcement-learning tsp

Created almost 3 years ago · Last pushed almost 3 years ago

Metadata Files

Readme License Citation

RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research

:one: First work to incorporate end-to-end vehicle routing model in a modern RL platform (CleanRL)

:zap: Speed up the training of Attention Model by 8 times (25hours $\to$ 3 hours)

:mag_right: A flexible framework for developing model, algorithm, environment, and search for operation research

News

13/04/2023: We release web demo on Hugging Face 🤗!
24/03/2023: We release our paper on arxiv!
20/03/2023: We release jupyter lab demo and pretrained checkpoints!
10/03/2023: We release our codebase!

Demo

We provide inference demo on colab notebook:

| Environment | Search | Demo | | ----------- | ------------ | ------------------------------------------------------------ | | TSP | Greedy |

| | CVRP | Multi-Greedy |

|

Installation

Conda

```shell conda env create -n -f environment.yml

The environment.yml was generated from

conda env export --no-builds > environment.yml

``` It can take a few minutes.

Optional dependency

wandb

Refer to their quick start guide for installation.

File structures

All the major implementations were under rlor folder. shell ./rlor ├── envs │ ├── tsp_data.py # load pre-generated data for evaluation │ ├── tsp_vector_env.py # define the (vectorized) gym environment │ ├── cvrp_data.py │ └── cvrp_vector_env.py ├── models │ ├── attention_model_wrapper.py # wrap refactored attention model to cleanRL │ └── nets # contains refactored attention model └── ppo_or.py # implementaion of ppo with attention model for operation research problems

The ppo_or.py was modified from cleanrl/ppo.py. To see what's changed, use diff: ```shell

apt install diff

diff --color ppo.py ppo_or.py ```

Training OR model with PPO

TSP

shell python ppo_or.py --num-steps 51 --env-id tsp-v0 --env-entry-point envs.tsp_vector_env:TSPVectorEnv --problem tsp

CVRP

shell python ppo_or.py --num-steps 60 --env-id cvrp-v0 --env-entry-point envs.cvrp_vector_env:CVRPVectorEnv --problem cvrp

Enable WandB

shell python ppo_or.py ... --track Add --track argument to enable tracking with WandB.

Where is the tsp data?

It can be generated from the official repo of the attention-learn-to-route paper. You may modify the ./envs/tsp_data.py to update the path to data accordingly.

Acknowledgements

The neural network model is refactored and developed from Attention, Learn to Solve Routing Problems!.

The idea of multiple trajectory training/ inference is from POMO: Policy Optimization with Multiple Optima for Reinforcement Learning.

The RL environments are defined with OpenAI Gym.

The PPO algorithm implementation is based on CleanRL.

Owner

Login: cpwan
Kind: user

Website: https://cpwan.github.io/
Repositories: 20
Profile: https://github.com/cpwan

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "WAN"
  given-names: "Ching Pui"
  orcid: "https://orcid.org/0000-0002-6217-5418"
- family-names: "LI"
  given-names: "Tung"
- family-names: "WANG"
  given-names: "Jason Min"
title: "RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research"
version: 1.0.0
doi: 10.5281/zenodo.1234
date-released: 2023-03-23
url: "https://github.com/cpwan/RLOR"
preferred-citation:
  type: misc
  authors:
  - family-names: "WAN"
    given-names: "Ching Pui"
    orcid: "https://orcid.org/0000-0002-6217-5418"
  - family-names: "LI"
    given-names: "Tung"
  - family-names: "WANG"
    given-names: "Jason Min"
  doi: 10.48550/arXiv.2303.13117
  title: "RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research"
  year: 2023
  eprint: "arXiv:2303.13117"
  url : "http://arxiv.org/abs/2303.13117"

GitHub Events

Total

Issues event: 2
Watch event: 30
Issue comment event: 1
Fork event: 3

Last Year

Issues event: 2
Watch event: 30
Issue comment event: 1
Fork event: 3

Committers

Last synced: about 2 years ago

All Time

Total Commits: 10
Total Committers: 3
Avg Commits per committer: 3.333
Development Distribution Score (DDS): 0.2

Past Year

Commits: 10
Committers: 3
Avg Commits per committer: 3.333
Development Distribution Score (DDS): 0.2

Top Committers

Name	Email	Commits
cpwan	c**n@c**k	8
Patrick WAN	c**5@l**m	1
TonyLiHK	1****K	1

Committer Domains (Top 20 + Academic)

lenovo.com: 1 connect.ust.hk: 1

Issues and Pull Requests

Last synced: about 2 years ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

WaitDumplings (1)

Pull Request Authors

cpwan (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

environment.yml pypi

absl-py ==1.3.0
cachetools ==5.2.0
certifi ==2022.9.24
cfgv ==3.3.1
charset-normalizer ==2.1.1
cloudpickle ==2.2.0
distlib ==0.3.6
filelock ==3.8.0
google-auth ==2.14.1
google-auth-oauthlib ==0.4.6
grpcio ==1.50.0
gym ==0.23.1
gym-notices ==0.0.8
identify ==2.5.8
idna ==3.4
importlib-metadata ==5.0.0
llvmlite ==0.39.1
markdown ==3.4.1
markupsafe ==2.1.1
nodeenv ==1.7.0
numba ==0.56.4
numpy ==1.23.4
nvidia-cublas-cu11 ==11.10.3.66
nvidia-cuda-nvrtc-cu11 ==11.7.99
nvidia-cuda-runtime-cu11 ==11.7.99
nvidia-cudnn-cu11 ==8.5.0.96
oauthlib ==3.2.2
pillow ==9.3.0
platformdirs ==2.5.3
pre-commit ==2.20.0
protobuf ==3.20.3
pyasn1 ==0.4.8
pyasn1-modules ==0.2.8
pygame ==2.1.0
pyyaml ==6.0
requests ==2.28.1
requests-oauthlib ==1.3.1
rsa ==4.9
tensorboard ==2.11.0
tensorboard-data-server ==0.6.1
tensorboard-plugin-wit ==1.8.1
toml ==0.10.2
torch ==1.13.0
torchvision ==0.14.0
typing-extensions ==4.4.0
urllib3 ==1.26.12
virtualenv ==20.16.6
werkzeug ==2.2.2
zipp ==3.10.0

rlor

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research

News

Demo

Installation

Conda

The environment.yml was generated from

conda env export --no-builds > environment.yml

Optional dependency

File structures

apt install diff

Training OR model with PPO

TSP

CVRP

Enable WandB

Where is the tsp data?

Acknowledgements

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies