https://github.com/data-science-in-mechanical-engineering/entropy_robustness

Code for the paper "Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization" (ECML 2025)

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (6.1%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Code for the paper "Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization" (ECML 2025)

Basic Info

Host: GitHub
Owner: Data-Science-in-Mechanical-Engineering
Language: Python
Default Branch: main
Size: 10.5 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme

Instructions for Reproducing Results

First, install the required packages.

bash pip install -r requirements.txt

Gridworld Experiments

We train and evaluate the gridworld agent in both (fenced) cliff environment by running the following experiments.

bash python -m src.gridworld.evaluation constrained_gridworld_value configured default # 1 python -m src.gridworld.evaluation constrained_gridworld_entropy configured default # 2 python -m src.gridworld.evaluation unconstrained_gridworld_value configured default # 3 python -m src.gridworld.evaluation delta_as_function_of_failure_penalty configured default # 4 python -m src.gridworld.evaluation safety_as_function_of_failure_alpha configured default # 5

Pendulum Experiments

To train a constraints-penalized, SAC Pendulum model, run the following command.

bash python -m src.pendulum.training --alpha={alpha} --seed={seed}

where {alpha} is the temperature parameter and {seed} is the random seed. Extract the best-performing model and store it at checkpoints_pendulum/PenalizedPendulumEnvironment__{seed}__{alpha}__{timestamp}/best-model.pth.

In our experiments, we used alpha={0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0} and seeds ranging from 1 to 25. We evaluate the trained models by running the following experiments.

bash python -m src.pendulum.evaluation evaluate_mode_policies configured default # 6 python -m src.pendulum.evaluation evaluate_disturbed_mode_policies configured default # 7

The results are automatically stored to results/pendulum.

Hopper Experiments

To train a constraints-penalized, SAC Hopper model, run the following command.

bash python -m src.hopper.training --alpha={alpha} --seed={seed}

where {alpha} is the temperature parameter and {seed} is the random seed. Extract the best-performing model and store it at checkpoints_pendulum/PenalizedHopperEnvironment__{seed}__{alpha}__{timestamp}/best-model.pth.

In our experiments, we used alpha={0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0} and seeds ranging from 1 to 25. We evaluate the trained models by running the following experiments.

bash python -m src.hopper.evaluation evaluate_mode_policies configured default # 8 python -m src.hopper.evaluation evaluate_disturbed_mode_policies configured default # 9

The results are automatically stored to results/hopper.

Figures

Figure 1

Path. results/gridworld/constrained_gridworld_value/{timestamp}/plots/plot_values_constrained.pdf

Figure 2

Path. results/gridworld/unconstrained_gridworld_entropy/{timestamp}/plots/plot_value_grid_unconstrained.pdf

Figure 3

bash python -m src.pendulum.analyze_disturbed_mode_success_rates configured small # 10 python -m src.hopper.analyze_disturbed_mode_success_rates configured small # 11 python -m src.figures.figure_3 --run_pendulum_success_rate={timestamp result 10} run_hopper_success_rate={timestamp result 11}

Path. results/figure_3/{timestamp}/plots/plot_figure_3.pdf

Figure 4

Path. results/gridworld/constrained_gridworld_entropy/{timestamp}/plots/plot_entropy_constrained/alpha_4_small.pdf

Figure 5

bash python -m src.figures.figure_5 --run_name_delta={timestamp result 4} --run_name_safety={timestamp result 5}

Path. results/figure_5/{timestamp}/plots/plot_figure_5.pdf

Figure 6

bash python -m src.pendulum.evaluation analyze_mode_environment_returns --run_name={timestamp result 6} --height=2 --width=2.75 # 12 python -m src.hopper.evaluation analyze_mode_environment_returns --run_name={timestamp result 8} --height=2 --width=2.75 # 13 python -m src.figures.figure_6 --run_pendulum_environment_return={timestamp result 12} --run_hopper_environment_return={timestamp result 13}

Path. results/figure_6/{timestamp}/plots/plot_figure_6.pdf

Figure 7

bash python -m src.pendulum.evaluation analyze_mode_environment_returns configured full # 14 Path. results/pendulum/analyze_mode_environment_returns/{timestamp}/plots/heatmap_disturbed_success_rate.pdf

Figure 8

bash python -m src.hopper.evaluation analyze_mode_environment_returns configured full # 15 Path. results/hopper/analyze_mode_environment_returns/{timestamp}/plots/heatmap_disturbed_success_rate.pdf

Owner

Name: Data Science in Mechanical Engineering (DSME)
Login: Data-Science-in-Mechanical-Engineering
Kind: organization
Location: Aachen, Germany

Website: https://www.dsme.rwth-aachen.de
Repositories: 3
Profile: https://github.com/Data-Science-in-Mechanical-Engineering

Public code repository of the Institute for Data Science in Mechanical Engineering at the RWTH Aachen University

GitHub Events

Total

Member event: 1

Last Year

Member event: 1

Dependencies

requirements.txt pypi

Farama-Notifications ==0.0.4
GitPython ==3.1.43
Jinja2 ==3.1.4
Markdown ==3.6
MarkupSafe ==2.1.5
PyOpenGL ==3.1.7
PyYAML ==6.0.1
Pygments ==2.18.0
Werkzeug ==3.0.3
absl-py ==2.1.0
certifi ==2024.7.4
charset-normalizer ==3.3.2
click ==8.1.7
cloudpickle ==3.0.0
contourpy ==1.2.1
cycler ==0.12.1
decorator ==4.4.2
docker-pycreds ==0.4.0
docstring_parser ==0.16
etils ==1.9.2
filelock ==3.15.4
fonttools ==4.53.1
fsspec ==2024.6.1
gitdb ==4.0.11
glfw ==2.7.0
grpcio ==1.65.1
gym-notices ==0.0.8
gymnasium ==0.29.1
idna ==3.7
imageio ==2.34.2
imageio-ffmpeg ==0.5.1
importlib_resources ==6.4.0
kiwisolver ==1.4.5
markdown-it-py ==3.0.0
matplotlib ==3.9.1
mdurl ==0.1.2
moviepy ==1.0.3
mpmath ==1.3.0
mujoco ==3.2.0
networkx ==3.3
numpy ==1.26.4
packaging ==24.1
pandas ==2.2.2
pillow ==10.4.0
platformdirs ==4.2.2
proglog ==0.1.10
protobuf ==4.25.3
psutil ==6.0.0
pygame ==2.6.0
pyparsing ==3.1.2
python-dateutil ==2.9.0.post0
pytz ==2024.1
requests ==2.32.3
rich ==13.7.1
scipy ==1.14.0
seaborn ==0.13.2
sentry-sdk ==2.10.0
setproctitle ==1.3.3
shtab ==1.7.1
six ==1.16.0
smmap ==5.0.1
stable_baselines3 ==2.3.2
sympy ==1.13.1
tensorboard ==2.17.0
tensorboard-data-server ==0.7.2
torch ==2.3.1
torchaudio ==2.3.1
torchvision ==0.18.1
tqdm ==4.66.4
typing_extensions ==4.12.2
tyro ==0.8.5
tzdata ==2024.1
urllib3 ==2.2.2
wandb ==0.17.5
zipp ==3.19.2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/data-science-in-mechanical-engineering/entropy_robustness

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Instructions for Reproducing Results

Gridworld Experiments

Pendulum Experiments

Hopper Experiments

Figures

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Owner

GitHub Events

Total

Last Year

Dependencies