https://github.com/data-science-in-mechanical-engineering/entropy_robustness

Code for the paper "Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization" (ECML 2025)

https://github.com/data-science-in-mechanical-engineering/entropy_robustness

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.1%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Code for the paper "Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization" (ECML 2025)

Basic Info
  • Host: GitHub
  • Owner: Data-Science-in-Mechanical-Engineering
  • Language: Python
  • Default Branch: main
  • Size: 10.5 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme

README.md

Instructions for Reproducing Results

First, install the required packages.

bash pip install -r requirements.txt

Gridworld Experiments

We train and evaluate the gridworld agent in both (fenced) cliff environment by running the following experiments.

bash python -m src.gridworld.evaluation constrained_gridworld_value configured default # 1 python -m src.gridworld.evaluation constrained_gridworld_entropy configured default # 2 python -m src.gridworld.evaluation unconstrained_gridworld_value configured default # 3 python -m src.gridworld.evaluation delta_as_function_of_failure_penalty configured default # 4 python -m src.gridworld.evaluation safety_as_function_of_failure_alpha configured default # 5

Pendulum Experiments

To train a constraints-penalized, SAC Pendulum model, run the following command.

bash python -m src.pendulum.training --alpha={alpha} --seed={seed}

where {alpha} is the temperature parameter and {seed} is the random seed. Extract the best-performing model and store it at checkpoints_pendulum/PenalizedPendulumEnvironment__{seed}__{alpha}__{timestamp}/best-model.pth.

In our experiments, we used alpha={0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0} and seeds ranging from 1 to 25. We evaluate the trained models by running the following experiments.

bash python -m src.pendulum.evaluation evaluate_mode_policies configured default # 6 python -m src.pendulum.evaluation evaluate_disturbed_mode_policies configured default # 7

The results are automatically stored to results/pendulum.

Hopper Experiments

To train a constraints-penalized, SAC Hopper model, run the following command.

bash python -m src.hopper.training --alpha={alpha} --seed={seed}

where {alpha} is the temperature parameter and {seed} is the random seed. Extract the best-performing model and store it at checkpoints_pendulum/PenalizedHopperEnvironment__{seed}__{alpha}__{timestamp}/best-model.pth.

In our experiments, we used alpha={0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0} and seeds ranging from 1 to 25. We evaluate the trained models by running the following experiments.

bash python -m src.hopper.evaluation evaluate_mode_policies configured default # 8 python -m src.hopper.evaluation evaluate_disturbed_mode_policies configured default # 9

The results are automatically stored to results/hopper.

Figures

Figure 1

Path. results/gridworld/constrained_gridworld_value/{timestamp}/plots/plot_values_constrained.pdf

Figure 2

Path. results/gridworld/unconstrained_gridworld_entropy/{timestamp}/plots/plot_value_grid_unconstrained.pdf

Figure 3

bash python -m src.pendulum.analyze_disturbed_mode_success_rates configured small # 10 python -m src.hopper.analyze_disturbed_mode_success_rates configured small # 11 python -m src.figures.figure_3 --run_pendulum_success_rate={timestamp result 10} run_hopper_success_rate={timestamp result 11}

Path. results/figure_3/{timestamp}/plots/plot_figure_3.pdf

Figure 4

Path. results/gridworld/constrained_gridworld_entropy/{timestamp}/plots/plot_entropy_constrained/alpha_4_small.pdf

Figure 5

bash python -m src.figures.figure_5 --run_name_delta={timestamp result 4} --run_name_safety={timestamp result 5}

Path. results/figure_5/{timestamp}/plots/plot_figure_5.pdf

Figure 6

bash python -m src.pendulum.evaluation analyze_mode_environment_returns --run_name={timestamp result 6} --height=2 --width=2.75 # 12 python -m src.hopper.evaluation analyze_mode_environment_returns --run_name={timestamp result 8} --height=2 --width=2.75 # 13 python -m src.figures.figure_6 --run_pendulum_environment_return={timestamp result 12} --run_hopper_environment_return={timestamp result 13}

Path. results/figure_6/{timestamp}/plots/plot_figure_6.pdf

Figure 7

bash python -m src.pendulum.evaluation analyze_mode_environment_returns configured full # 14 Path. results/pendulum/analyze_mode_environment_returns/{timestamp}/plots/heatmap_disturbed_success_rate.pdf

Figure 8

bash python -m src.hopper.evaluation analyze_mode_environment_returns configured full # 15 Path. results/hopper/analyze_mode_environment_returns/{timestamp}/plots/heatmap_disturbed_success_rate.pdf

Owner

  • Name: Data Science in Mechanical Engineering (DSME)
  • Login: Data-Science-in-Mechanical-Engineering
  • Kind: organization
  • Location: Aachen, Germany

Public code repository of the Institute for Data Science in Mechanical Engineering at the RWTH Aachen University

GitHub Events

Total
  • Member event: 1
Last Year
  • Member event: 1

Dependencies

requirements.txt pypi
  • Farama-Notifications ==0.0.4
  • GitPython ==3.1.43
  • Jinja2 ==3.1.4
  • Markdown ==3.6
  • MarkupSafe ==2.1.5
  • PyOpenGL ==3.1.7
  • PyYAML ==6.0.1
  • Pygments ==2.18.0
  • Werkzeug ==3.0.3
  • absl-py ==2.1.0
  • certifi ==2024.7.4
  • charset-normalizer ==3.3.2
  • click ==8.1.7
  • cloudpickle ==3.0.0
  • contourpy ==1.2.1
  • cycler ==0.12.1
  • decorator ==4.4.2
  • docker-pycreds ==0.4.0
  • docstring_parser ==0.16
  • etils ==1.9.2
  • filelock ==3.15.4
  • fonttools ==4.53.1
  • fsspec ==2024.6.1
  • gitdb ==4.0.11
  • glfw ==2.7.0
  • grpcio ==1.65.1
  • gym-notices ==0.0.8
  • gymnasium ==0.29.1
  • idna ==3.7
  • imageio ==2.34.2
  • imageio-ffmpeg ==0.5.1
  • importlib_resources ==6.4.0
  • kiwisolver ==1.4.5
  • markdown-it-py ==3.0.0
  • matplotlib ==3.9.1
  • mdurl ==0.1.2
  • moviepy ==1.0.3
  • mpmath ==1.3.0
  • mujoco ==3.2.0
  • networkx ==3.3
  • numpy ==1.26.4
  • packaging ==24.1
  • pandas ==2.2.2
  • pillow ==10.4.0
  • platformdirs ==4.2.2
  • proglog ==0.1.10
  • protobuf ==4.25.3
  • psutil ==6.0.0
  • pygame ==2.6.0
  • pyparsing ==3.1.2
  • python-dateutil ==2.9.0.post0
  • pytz ==2024.1
  • requests ==2.32.3
  • rich ==13.7.1
  • scipy ==1.14.0
  • seaborn ==0.13.2
  • sentry-sdk ==2.10.0
  • setproctitle ==1.3.3
  • shtab ==1.7.1
  • six ==1.16.0
  • smmap ==5.0.1
  • stable_baselines3 ==2.3.2
  • sympy ==1.13.1
  • tensorboard ==2.17.0
  • tensorboard-data-server ==0.7.2
  • torch ==2.3.1
  • torchaudio ==2.3.1
  • torchvision ==0.18.1
  • tqdm ==4.66.4
  • typing_extensions ==4.12.2
  • tyro ==0.8.5
  • tzdata ==2024.1
  • urllib3 ==2.2.2
  • wandb ==0.17.5
  • zipp ==3.19.2