sumo-rl-mobicharger

OpenAI-gym-like Reinforcement Learning environment for Dispatching of Mobile Chargers with SUMO. Compatible with Gym and popular RL libraries such as stable-baselines3.

https://github.com/liyan2015/sumo-rl-mobicharger

Keywords

gym machine-learning python reinforcement-learning sumo urban-computing

Last synced: 6 months ago · JSON representation ·

Repository

OpenAI-gym-like Reinforcement Learning environment for Dispatching of Mobile Chargers with SUMO. Compatible with Gym and popular RL libraries such as stable-baselines3.

Basic Info

Host: GitHub
Owner: liyan2015
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 13.8 MB

Statistics

Stars: 14
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Topics

gym machine-learning python reinforcement-learning sumo urban-computing

Created over 2 years ago · Last pushed 11 months ago

Metadata Files

Readme License Citation

SUMO-RL-MobiCharger

SUMO-RL-MobiCharger provides an OpenAI-gym-like environment for the implementation of RL-based mobile charger dispatching methods on the SUMO simulator. The fetures of this environment are four-fold:

A simple and customizable interface to work with Reinforcement Learning for Dispatching of Mobile Chargers on city-scale transportation network with SUMO
Compatibility with OpenAI-gym and popular RL libraries such as stable-baselines3 and RL Baselines3 Zoo
Easy modification of state and reward functions for research focusing on vehicle routing or scheduling problems
Support parallel training of multiple environments via the use of SubprocVecEnv in stable-baselines3

| | |:--:| | Blue vehicles are mobile chargers, yellow vehicles are electric vehicles, green highlight means charging between mobile chargers and EVs, and blue highlight means charging between mobile chargers and charging stations |

Install

Install SUMO >= 1.16.0:

Install SUMO as in their doc. Note that this environment uses Libsumo as default for simulation speedup, but sumo-gui does not work with Libsumo on Windows (more details). If you need to go back to TraCI, uncomment import traci and modify the code in reset() of SumoEnv.

Install the Necessary Packages

Install the necessary packages listed in requirements.txt

Install SUMO-RL-MobiCharger

Clone the latest version and install it in gym bash git clone https://github.com/liyan2015/SUMO-RL-MobiCharger.git cd SUMO-RL-MobiCharger/source pip install -e .

Training & Testing

In case the environment is not compatible with the up-to-date rl-baselines3-zoo, use this old cloned copy for tuning.

Register SUMO-RL-MobiCharger in RL Baselines3 Zoo

The main class is SumoEnv. To train with RL Baselines3 Zoo, you need to register the environment as in their doc and add the following code to exp_manager.py:

```python

On most env, SubprocVecEnv does not help and is quite memory hungry

therefore we use DummyVecEnv by default

if "SumoEnv" not in self.envname.gymid: env = makevecenv( makeenv, nenvs=nenvs, seed=self.seed, envkwargs=self.envkwargs, monitordir=logdir, wrapperclass=self.envwrapper, vecenvcls=self.vecenvclass, vecenvkwargs=self.vecenvkwargs, monitorkwargs=self.monitorkwargs, ) else: def makeenv( envconfig={ 'guif':False, 'label':'evaluate' }, rank: int = 0, seed: int = 0 ): def init(): env = gym.make('SumoEnv-v0', **envconfig) env = Monitor(env, logdir) env.seed(seed + rank) env.actionspace.seed(seed + rank) return env setrandomseed(seed) return _init

if eval_env:
    if self.verbose > 0:
        print("Creating evaluate environment.")

    env = SubprocVecEnv([make_env() for i in range(n_envs)])
else:
    env = SubprocVecEnv([make_env(
        {
            'gui_f':False, 
            'label':'train'+str(i+1)
        }, rank=i*2) for i in range(n_envs)])

```

Training

For training, use the following command line:

bash python train.py --algo ppo --env SumoEnv-v0 --num-threads 1 --progress --conf-file hyperparams/python/sumoenv_config.py --save-freq 500000 --log-folder /usr/data2/canaltrain_log/ --tensorboard-log /usr/data2/canaltrain_tensorboard/ --verbose 2 --eval-freq 2000000 --eval-episodes 10 --n-eval-envs 10 --vec-env subproc

Resume Training

For resume training with different EV route files, use the following command line or check the doc of RL Baselines3 Zoo:

bash python train.py --algo ppo --env SumoEnv-v0 --num-threads 1 --progress --conf-file hyperparams/python/sumoenv_config.py --save-freq 500000 --log-folder /usr/data2/canaltrain_log/ --tensorboard-log /usr/data2/canaltrain_tensorboard/ --verbose 2 --eval-freq 2000000 --eval-episodes 10 --n-eval-envs 10 --vec-env subproc -i /usr/data2/canaltrain_log/ppo/SumoEnv-v0_16/rl_model_12999532_steps.zip

Testing

Change the model_path and stats_path in canal_test.py and run:

bash python canal_test.py

MDP - Observation, Action and Reward

Observation

The default observation for the agent is a vector: python obs = [SOC_state, charger_state, elig_act_state, dir_state, charge_station_state] - SOC_state indicates the amount of SOC on the road network pending to be refilled by mobile chargers - charger_state indicates current road segment, staying time, chargingothers bit, chargeself bit, SOC, distance to target vehicle and neighborvehicle bit of each mobile charger - ```eligactstateindicates the eligible actions that each mobile charger can take at current road segment -dirstateindicates the best action of each mobile charger given its current road segment -chargestationstate``` indicates the remaining SOCs that the mobile chargers will have if they go to the charging stations for a recharge

Action

The action space is discrete. Each edge in SUMO network is partitioned into several road segments:

Thus, the possible actions of the agent at each road segment can be illustrated as:

Througout the road network, a mobile charger can only take maximally 6 actions: stay (0), charge vehicles (1), go downstream road segments (2-5).

Reward

The default reward function is defined as:

+ 2 if a mobile charger charges an EV with step_charged_SOC
+ 3 * charger.step_charged_SOC + 0.5 * (1 - before_SOC) if a mobile charger charges itself with step_charged_SOC
+ 8e-2 if a mobile charger takes the best action
- 8e-2 if a mobile charger takes an action different from the best one
- 8e-1 if a mobile charger takes an ineligible action given its current road segment
- 300 if a mobile charger exhausts its SOC
+ 250 if the agent succeeds in charging all the EVs and support the completion of their trips

Citing

If you use this repository, please cite: bibtex @article{yan2022mobicharger, title={MobiCharger: Optimal Scheduling for Cooperative EV-to-EV Dynamic Wireless Charging}, author={Yan, Li and Shen, Haiying and Kang, Liuwang and Zhao, Juanjuan and Zhang, Zhe and Xu, Chengzhong}, journal={IEEE Transactions on Mobile Computing}, volume={22}, number={12}, pages={6889-6906}, year={2023}, }

List of publications that cite this work: Google Scholar

Owner

Login: liyan2015
Kind: user

Repositories: 2
Profile: https://github.com/liyan2015

Citation (CITATION.bib)

@article{yan2022mobicharger,
  title={MobiCharger: Optimal Scheduling for Cooperative EV-to-EV Dynamic Wireless Charging},
  author={Yan, Li and Shen, Haiying and Kang, Liuwang and Zhao, Juanjuan and Zhang, Zhe and Xu, Chengzhong},
  journal={IEEE Transactions on Mobile Computing},
  volume={22}, 
  number={12},
  pages={6889-6906},
  year={2023},
}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science