reward_shaping
Reward Shaping Experiments with Temporal Logic for Hierarchical Objectives
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary
Repository
Reward Shaping Experiments with Temporal Logic for Hierarchical Objectives
Basic Info
Statistics
- Stars: 2
- Watchers: 1
- Forks: 2
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Hierarchical Potential-based Reward Shaping
Experiments on automatic reward shaping from formal task specifications.
Preprint available at this link
If you find this code useful, please reference in your paper:
@article{berducci2021hierarchical,
title={Hierarchical Potential-based Reward Shaping from Task Specifications},
author={Berducci, Luigi and Aguilar, Edgar A and Ni{\v{c}}kovi{\'c}, Dejan and Grosu, Radu},
journal={arXiv e-prints},
pages={arXiv--2110},
year={2021}
}
Installation
We tested this implementation with Python3.8 under Ubuntu 20.04.
To install the dependencies:
pip install -r requirements.txt
We assume you run the code from the project directory and that it is included in the PYTHONPATH.
Docker image

In alternative to the previous step,
we provide a docker-image with a working environment to reproduce this work.
You can either:
- pull the image from Dockerhub: docker pull luigiberducci/reward_shaping:latest
- or, build the image from scratch: docker build -t reward_shaping .
To run the container:
docker run --rm -it -u $(id -u):$(id -g) \
-v $(pwd):/src --gpus all \
<image-name> /bin/bash
Then, you can use any of the following scripts from within the container.
Run training
To train on the safe driving task in the racecar environment (for all the command line options, see --help):
python run_training.py --env racecar --task delta_drive \
--reward hprs --steps 1000000 --expdir my_exp
This command will start the training for 1M steps
using the reward hprs (Hierarchical Potential-based Reward Shaping).
The results will be stored in the directory logs/racecar/my_exp.
Play with trained agents
The directory checkpoints contains a collection of trained agents for various environments.
For each environment, we report an agent trained with our hprs and an agent trained with the default shaped reward.
The performance of the various agents are comparable,
even if hprs is an automatic shaping methodology,
while default is in most of the environment the result of an engineered shaping.
We provide the script eval_trained_models.py for playing with those.
To run:
python eval_trained_models.py --checkpoint checkpoints/bipedal_walker_hardcore_hrs.zip --n_episodes 10
This command will evaluate the given model for 10 episodes,
and report mean and std dev of the Policy Assessment Metric described in the paper.
Reproduce plot learning curves
Assuming to have reproduced the experiments and stored the logs into logs/<env> for each <env> of interest.
Then, you can reproduce the figure with learning curves with the script plot_learning_curves.py.
To run:
python plot_learning_curves.py --logdir logs --gby env \
--regex **/*default* **/*tltl* \
**/*bhnr* **/*morl_uni* \
**/*morl_dec* **/*hprs_sac* \
--binning 100000 --hlines 1.5 --clipminy 0 -legend
Request logs
If you do not have the compute resources to reproduce the experiments, you can use the logs of our experiments, stored on Zenodo.
For any issue or request, feel free to contact us.
Owner
- Name: EA Aguilar
- Login: EdAlexAguilar
- Kind: user
- Location: Vienna
- Company: AIT Austrian Institute of Technology
- Twitter: EdAlexAguilar
- Repositories: 3
- Profile: https://github.com/EdAlexAguilar
Theoretical Physicist
Citation (CITATION.cff)
cff-version: 1.2.0
title: Hierarchical Potential-based Reward Shaping from Task Specifications
message: If you use this software, please cite it using the metadata from this file.
authors:
- given-names: Luigi
family-names: Berducci
email: luigi.berducci@tuwien.ac.at
affiliation: TU Wien
orcid: 'https://orcid.org/0000-0002-3497-6007'
- given-names: Edgar
family-names: Aguilar
email: edgar.aguilar@ait.ac.at
affiliation: AIT Austrian Institute for Technology Gmbh
orcid: 'https://orcid.org/0000-0002-1177-9246'
- given-names: Dejan
family-names: Nickovic
email: dejan.nickovic@ait.ac.at
affiliation: AIT Austrian Institute for Technology Gmbh
orcid: 'https://orcid.org/0000-0001-5468-0396'
- given-names: Radu
family-names: Grosu
email: radu.grosu@tuwien.ac.at
orcid: 'https://orcid.org/0000-0001-5715-2142'
affiliation: TU Wien
preferred-citation:
type: article
authors:
- given-names: Luigi
family-names: Berducci
email: luigi.berducci@tuwien.ac.at
affiliation: TU Wien
orcid: 'https://orcid.org/0000-0002-3497-6007'
- given-names: Edgar
family-names: Aguilar
email: edgar.aguilar@ait.ac.at
affiliation: AIT Austrian Institute for Technology Gmbh
orcid: 'https://orcid.org/0000-0002-1177-9246'
- given-names: Dejan
family-names: Nickovic
email: dejan.nickovic@ait.ac.at
affiliation: AIT Austrian Institute for Technology Gmbh
orcid: 'https://orcid.org/0000-0001-5468-0396'
- given-names: Radu
family-names: Grosu
email: radu.grosu@tuwien.ac.at
orcid: 'https://orcid.org/0000-0001-5715-2142'
affiliation: TU Wien
url: "https://arxiv.org/abs/2110.02792"
journal: 'arXiv'
year: 2022
title: "Hierarchical Potential-based Reward Shaping from Task Specifications"
identifiers:
- type: url
value: "https://arxiv.org/abs/2110.02792"
description: arXiv
repository-code: 'https://github.com/EdAlexAguilar/reward_shaping'
GitHub Events
Total
- Delete event: 1
- Push event: 4
- Create event: 1
Last Year
- Delete event: 1
- Push event: 4
- Create event: 1
Dependencies
- PyYAML *
- gym <0.22.0
- h5py *
- matplotlib *
- moviepy *
- numpy *
- pyglet *
- stable-baselines3 *
- tensorboard *
- yamldataclassconfig *
- pytorch/pytorch 1.11.0-cuda11.3-cudnn8-runtime build
- cloudpickle ==2.1.0
- cloudpickle ==2.1.0
