reward_shaping

Reward Shaping Experiments with Temporal Logic for Hierarchical Objectives

https://github.com/edalexaguilar/reward_shaping

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.1%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Reward Shaping Experiments with Temporal Logic for Hierarchical Objectives

Basic Info
  • Host: GitHub
  • Owner: EdAlexAguilar
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 21.5 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Created over 5 years ago · Last pushed over 1 year ago
Metadata Files
Readme Citation

README.md

Hierarchical Potential-based Reward Shaping

Experiments on automatic reward shaping from formal task specifications.

Watch the video

Preprint available at this link

If you find this code useful, please reference in your paper:

@article{berducci2021hierarchical, title={Hierarchical Potential-based Reward Shaping from Task Specifications}, author={Berducci, Luigi and Aguilar, Edgar A and Ni{\v{c}}kovi{\'c}, Dejan and Grosu, Radu}, journal={arXiv e-prints}, pages={arXiv--2110}, year={2021} }

Installation

We tested this implementation with Python3.8 under Ubuntu 20.04. To install the dependencies:

pip install -r requirements.txt

We assume you run the code from the project directory and that it is included in the PYTHONPATH.

Docker image

docker-image

In alternative to the previous step, we provide a docker-image with a working environment to reproduce this work. You can either: - pull the image from Dockerhub: docker pull luigiberducci/reward_shaping:latest - or, build the image from scratch: docker build -t reward_shaping .

To run the container:

docker run --rm -it -u $(id -u):$(id -g) \ -v $(pwd):/src --gpus all \ <image-name> /bin/bash

Then, you can use any of the following scripts from within the container.

Run training

To train on the safe driving task in the racecar environment (for all the command line options, see --help):

python run_training.py --env racecar --task delta_drive \ --reward hprs --steps 1000000 --expdir my_exp

This command will start the training for 1M steps using the reward hprs (Hierarchical Potential-based Reward Shaping). The results will be stored in the directory logs/racecar/my_exp.

Play with trained agents

The directory checkpoints contains a collection of trained agents for various environments. For each environment, we report an agent trained with our hprs and an agent trained with the default shaped reward. The performance of the various agents are comparable, even if hprs is an automatic shaping methodology, while default is in most of the environment the result of an engineered shaping.

We provide the script eval_trained_models.py for playing with those. To run: python eval_trained_models.py --checkpoint checkpoints/bipedal_walker_hardcore_hrs.zip --n_episodes 10

This command will evaluate the given model for 10 episodes, and report mean and std dev of the Policy Assessment Metric described in the paper.

Reproduce plot learning curves

Assuming to have reproduced the experiments and stored the logs into logs/<env> for each <env> of interest. Then, you can reproduce the figure with learning curves with the script plot_learning_curves.py. To run:

python plot_learning_curves.py --logdir logs --gby env \ --regex **/*default* **/*tltl* \ **/*bhnr* **/*morl_uni* \ **/*morl_dec* **/*hprs_sac* \ --binning 100000 --hlines 1.5 --clipminy 0 -legend

Request logs

If you do not have the compute resources to reproduce the experiments, you can use the logs of our experiments, stored on Zenodo.

For any issue or request, feel free to contact us.

DOI

Owner

  • Name: EA Aguilar
  • Login: EdAlexAguilar
  • Kind: user
  • Location: Vienna
  • Company: AIT Austrian Institute of Technology

Theoretical Physicist

Citation (CITATION.cff)

cff-version: 1.2.0
title: Hierarchical Potential-based Reward Shaping from Task Specifications
message: If you use this software, please cite it using the metadata from this file.
authors:
  - given-names: Luigi
    family-names: Berducci
    email: luigi.berducci@tuwien.ac.at
    affiliation: TU Wien
    orcid: 'https://orcid.org/0000-0002-3497-6007'
  - given-names: Edgar
    family-names: Aguilar
    email: edgar.aguilar@ait.ac.at
    affiliation: AIT Austrian Institute for Technology Gmbh
    orcid: 'https://orcid.org/0000-0002-1177-9246'
  - given-names: Dejan
    family-names: Nickovic
    email: dejan.nickovic@ait.ac.at
    affiliation: AIT Austrian Institute for Technology Gmbh
    orcid: 'https://orcid.org/0000-0001-5468-0396'
  - given-names: Radu
    family-names: Grosu
    email: radu.grosu@tuwien.ac.at
    orcid: 'https://orcid.org/0000-0001-5715-2142'
    affiliation: TU Wien

preferred-citation:
  type: article
  authors:
      - given-names: Luigi
        family-names: Berducci
        email: luigi.berducci@tuwien.ac.at
        affiliation: TU Wien
        orcid: 'https://orcid.org/0000-0002-3497-6007'
      - given-names: Edgar
        family-names: Aguilar
        email: edgar.aguilar@ait.ac.at
        affiliation: AIT Austrian Institute for Technology Gmbh
        orcid: 'https://orcid.org/0000-0002-1177-9246'
      - given-names: Dejan
        family-names: Nickovic
        email: dejan.nickovic@ait.ac.at
        affiliation: AIT Austrian Institute for Technology Gmbh
        orcid: 'https://orcid.org/0000-0001-5468-0396'
      - given-names: Radu
        family-names: Grosu
        email: radu.grosu@tuwien.ac.at
        orcid: 'https://orcid.org/0000-0001-5715-2142'
        affiliation: TU Wien
  url: "https://arxiv.org/abs/2110.02792"
  journal: 'arXiv'
  year: 2022
  title: "Hierarchical Potential-based Reward Shaping from Task Specifications"
identifiers:
  - type: url
    value: "https://arxiv.org/abs/2110.02792"
    description: arXiv
repository-code: 'https://github.com/EdAlexAguilar/reward_shaping'

GitHub Events

Total
  • Delete event: 1
  • Push event: 4
  • Create event: 1
Last Year
  • Delete event: 1
  • Push event: 4
  • Create event: 1

Dependencies

requirements.txt pypi
  • PyYAML *
  • gym <0.22.0
  • h5py *
  • matplotlib *
  • moviepy *
  • numpy *
  • pyglet *
  • stable-baselines3 *
  • tensorboard *
  • yamldataclassconfig *
Dockerfile docker
  • pytorch/pytorch 1.11.0-cuda11.3-cudnn8-runtime build
reward_shaping/envs/racecar/requirements.txt pypi
  • cloudpickle ==2.1.0
reward_shaping/envs/racecar2/requirements.txt pypi
  • cloudpickle ==2.1.0