https://github.com/data-science-in-mechanical-engineering/entropy_robustness
Code for the paper "Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization" (ECML 2025)
https://github.com/data-science-in-mechanical-engineering/entropy_robustness
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.1%) to scientific vocabulary
Repository
Code for the paper "Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization" (ECML 2025)
Basic Info
- Host: GitHub
- Owner: Data-Science-in-Mechanical-Engineering
- Language: Python
- Default Branch: main
- Size: 10.5 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Instructions for Reproducing Results
First, install the required packages.
bash
pip install -r requirements.txt
Gridworld Experiments
We train and evaluate the gridworld agent in both (fenced) cliff environment by running the following experiments.
bash
python -m src.gridworld.evaluation constrained_gridworld_value configured default # 1
python -m src.gridworld.evaluation constrained_gridworld_entropy configured default # 2
python -m src.gridworld.evaluation unconstrained_gridworld_value configured default # 3
python -m src.gridworld.evaluation delta_as_function_of_failure_penalty configured default # 4
python -m src.gridworld.evaluation safety_as_function_of_failure_alpha configured default # 5
Pendulum Experiments
To train a constraints-penalized, SAC Pendulum model, run the following command.
bash
python -m src.pendulum.training --alpha={alpha} --seed={seed}
where {alpha} is the temperature parameter and {seed} is the random seed.
Extract the best-performing model and store it at
checkpoints_pendulum/PenalizedPendulumEnvironment__{seed}__{alpha}__{timestamp}/best-model.pth.
In our experiments, we used alpha={0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0} and seeds ranging from 1 to 25.
We evaluate the trained models by running the following experiments.
bash
python -m src.pendulum.evaluation evaluate_mode_policies configured default # 6
python -m src.pendulum.evaluation evaluate_disturbed_mode_policies configured default # 7
The results are automatically stored to results/pendulum.
Hopper Experiments
To train a constraints-penalized, SAC Hopper model, run the following command.
bash
python -m src.hopper.training --alpha={alpha} --seed={seed}
where {alpha} is the temperature parameter and {seed} is the random seed.
Extract the best-performing model and store it at
checkpoints_pendulum/PenalizedHopperEnvironment__{seed}__{alpha}__{timestamp}/best-model.pth.
In our experiments, we used alpha={0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0} and seeds ranging from 1 to 25.
We evaluate the trained models by running the following experiments.
bash
python -m src.hopper.evaluation evaluate_mode_policies configured default # 8
python -m src.hopper.evaluation evaluate_disturbed_mode_policies configured default # 9
The results are automatically stored to results/hopper.
Figures
Figure 1
Path. results/gridworld/constrained_gridworld_value/{timestamp}/plots/plot_values_constrained.pdf
Figure 2
Path. results/gridworld/unconstrained_gridworld_entropy/{timestamp}/plots/plot_value_grid_unconstrained.pdf
Figure 3
bash
python -m src.pendulum.analyze_disturbed_mode_success_rates configured small # 10
python -m src.hopper.analyze_disturbed_mode_success_rates configured small # 11
python -m src.figures.figure_3 --run_pendulum_success_rate={timestamp result 10} run_hopper_success_rate={timestamp result 11}
Path. results/figure_3/{timestamp}/plots/plot_figure_3.pdf
Figure 4
Path. results/gridworld/constrained_gridworld_entropy/{timestamp}/plots/plot_entropy_constrained/alpha_4_small.pdf
Figure 5
bash
python -m src.figures.figure_5 --run_name_delta={timestamp result 4} --run_name_safety={timestamp result 5}
Path. results/figure_5/{timestamp}/plots/plot_figure_5.pdf
Figure 6
bash
python -m src.pendulum.evaluation analyze_mode_environment_returns --run_name={timestamp result 6} --height=2 --width=2.75 # 12
python -m src.hopper.evaluation analyze_mode_environment_returns --run_name={timestamp result 8} --height=2 --width=2.75 # 13
python -m src.figures.figure_6 --run_pendulum_environment_return={timestamp result 12} --run_hopper_environment_return={timestamp result 13}
Path. results/figure_6/{timestamp}/plots/plot_figure_6.pdf
Figure 7
bash
python -m src.pendulum.evaluation analyze_mode_environment_returns configured full # 14
Path. results/pendulum/analyze_mode_environment_returns/{timestamp}/plots/heatmap_disturbed_success_rate.pdf
Figure 8
bash
python -m src.hopper.evaluation analyze_mode_environment_returns configured full # 15
Path. results/hopper/analyze_mode_environment_returns/{timestamp}/plots/heatmap_disturbed_success_rate.pdf
Owner
- Name: Data Science in Mechanical Engineering (DSME)
- Login: Data-Science-in-Mechanical-Engineering
- Kind: organization
- Location: Aachen, Germany
- Website: https://www.dsme.rwth-aachen.de
- Repositories: 3
- Profile: https://github.com/Data-Science-in-Mechanical-Engineering
Public code repository of the Institute for Data Science in Mechanical Engineering at the RWTH Aachen University
GitHub Events
Total
- Member event: 1
Last Year
- Member event: 1
Dependencies
- Farama-Notifications ==0.0.4
- GitPython ==3.1.43
- Jinja2 ==3.1.4
- Markdown ==3.6
- MarkupSafe ==2.1.5
- PyOpenGL ==3.1.7
- PyYAML ==6.0.1
- Pygments ==2.18.0
- Werkzeug ==3.0.3
- absl-py ==2.1.0
- certifi ==2024.7.4
- charset-normalizer ==3.3.2
- click ==8.1.7
- cloudpickle ==3.0.0
- contourpy ==1.2.1
- cycler ==0.12.1
- decorator ==4.4.2
- docker-pycreds ==0.4.0
- docstring_parser ==0.16
- etils ==1.9.2
- filelock ==3.15.4
- fonttools ==4.53.1
- fsspec ==2024.6.1
- gitdb ==4.0.11
- glfw ==2.7.0
- grpcio ==1.65.1
- gym-notices ==0.0.8
- gymnasium ==0.29.1
- idna ==3.7
- imageio ==2.34.2
- imageio-ffmpeg ==0.5.1
- importlib_resources ==6.4.0
- kiwisolver ==1.4.5
- markdown-it-py ==3.0.0
- matplotlib ==3.9.1
- mdurl ==0.1.2
- moviepy ==1.0.3
- mpmath ==1.3.0
- mujoco ==3.2.0
- networkx ==3.3
- numpy ==1.26.4
- packaging ==24.1
- pandas ==2.2.2
- pillow ==10.4.0
- platformdirs ==4.2.2
- proglog ==0.1.10
- protobuf ==4.25.3
- psutil ==6.0.0
- pygame ==2.6.0
- pyparsing ==3.1.2
- python-dateutil ==2.9.0.post0
- pytz ==2024.1
- requests ==2.32.3
- rich ==13.7.1
- scipy ==1.14.0
- seaborn ==0.13.2
- sentry-sdk ==2.10.0
- setproctitle ==1.3.3
- shtab ==1.7.1
- six ==1.16.0
- smmap ==5.0.1
- stable_baselines3 ==2.3.2
- sympy ==1.13.1
- tensorboard ==2.17.0
- tensorboard-data-server ==0.7.2
- torch ==2.3.1
- torchaudio ==2.3.1
- torchvision ==0.18.1
- tqdm ==4.66.4
- typing_extensions ==4.12.2
- tyro ==0.8.5
- tzdata ==2024.1
- urllib3 ==2.2.2
- wandb ==0.17.5
- zipp ==3.19.2