Recent Releases of openrlbenchmark

openrlbenchmark - v0.2.0 `rliable` integration, offline mode, multi metrics

This release brings exciting new features. I am excited to share that openrlbenchmark now includes direct integration with https://github.com/google-research/rliable. The new release also supports offline mode and plotting with multi metrics.

rliable integration

python -m openrlbenchmark.rlops \ --filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' \ 'ppo_continuous_action?tag=v1.0.0-27-gde3f410&cl=CleanRL PPO' \ --filters '?we=openrlbenchmark&wpn=baselines&ceik=env&cen=exp_name&metric=charts/episodic_return' \ 'baselines-ppo2-mlp?cl=openai/baselines PPO2' \ --env-ids HalfCheetah-v2 Hopper-v2 Walker2d-v2 \ --env-ids HalfCheetah-v2 Hopper-v2 Walker2d-v2 \ --no-check-empty-runs \ --pc.ncols 3 \ --pc.ncols-legend 3 \ --rliable \ --rc.score_normalization_method maxmin \ --rc.normalized_score_threshold 1.0 \ --rc.sample_efficiency_plots \ --rc.sample_efficiency_and_walltime_efficiency_method Median \ --rc.performance_profile_plots \ --rc.aggregate_metrics_plots \ --rc.sample_efficiency_num_bootstrap_reps 10 \ --rc.performance_profile_num_bootstrap_reps 10 \ --rc.interval_estimates_num_bootstrap_reps 10 \ --output-filename compare \ --scan-history

now yields

CC rliable contributors @agarwl @qgallouedec @DennisSoemers @lkevinzc.

Offline mode

We introduced an experimental offline mode. Sometimes even with caching --scan-history the script can still take a long time if there are too many environments or experiments. This is because we are still calling many wandb.Api().runs(..., filters) under the hood.

No worries though. When running with --scan-history, we also automatically build a local sqlite database to store the metadata of runs. Then, you can run python -m openrlbenchmark.rlops ... --scan-history --offline to generate the plots without having access to the internet. It should considerably speed up the plotting process as well. We are still working on improving the offline mode, so please let us know if you encounter any issues.

Multi metrics

Experimental! API may change.

Thanks to @ffelten, we can now plot multi metrics in the same plot now.

shell python -m openrlbenchmark.rlops_multi_metrics \ --filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metrics=charts/episodic_return&metrics=charts/episodic_length&metrics=charts/SPS&metrics=losses/actor_loss&metrics=losses/qf1_values&metrics=losses/qf1_loss' \ 'ddpg_continuous_action?tag=pr-371' \ 'ddpg_continuous_action?tag=pr-299' \ 'ddpg_continuous_action?tag=rlops-pilot' \ 'ddpg_continuous_action_jax?tag=pr-371-jax' \ 'ddpg_continuous_action_jax?tag=pr-298' \ --env-ids HalfCheetah-v2 Hopper-v2 Walker2d-v2 \ --no-check-empty-runs \ --pc.ncols 3 \ --pc.ncols-legend 2 \ --output-filename static/multi-metrics \ --scan-history --offline

What's Changed

  • Fix MuJoCo plots by @vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/4
  • New plotting API by @vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/5
  • Rlops API by @vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/6
  • Glow up README by @qgallouedec in https://github.com/openrlbenchmark/openrlbenchmark/pull/8
  • Add MORL Baselines to README by @ffelten in https://github.com/openrlbenchmark/openrlbenchmark/pull/10
  • Add rlops for plotting human normalized scores by @vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/11
  • use tyro for argparse by @vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/12
  • Various refactor and features by @vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/13
  • Remove unused tyro.conf.OmitSubcommandPrefixes config by @vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/16
  • Refactor by @vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/21
  • Create citation by @vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/17
  • General rliable support by @vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/22
  • [WIP] Multi metric support by @ffelten in https://github.com/openrlbenchmark/openrlbenchmark/pull/23

New Contributors

  • @qgallouedec made their first contribution in https://github.com/openrlbenchmark/openrlbenchmark/pull/8
  • @ffelten made their first contribution in https://github.com/openrlbenchmark/openrlbenchmark/pull/10

Full Changelog: https://github.com/openrlbenchmark/openrlbenchmark/compare/v0.0.1...v0.2.0

- Python
Published by vwxyzjn about 3 years ago

openrlbenchmark - v0.1.1b3 Better Table Printing

This release brings better table printing:

python -m openrlbenchmark.rlops \ --filters '?we=openrlbenchmark&wpn=baselines&ceik=env&cen=exp_name&metric=charts/episodic_return' 'baselines-ppo2-mlp' \ --filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' 'ppo_continuous_action?tag=v1.0.0-27-gde3f410' \ --filters '?we=openrlbenchmark&wpn=jaxrl&ceik=env_name&cen=algo&metric=training/return' 'sac' \ --env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 InvertedPendulum-v2 Humanoid-v2 Pusher-v2 \ --check-empty-runs False \ --ncols 3 \ --ncols-legend 3 \ --output-filename compare \ --scan-history

image

- Python
Published by vwxyzjn over 3 years ago

openrlbenchmark - v0.1.1b2 custom legend

What's Changed

The v0.1.1b2 supports customing the legend via the cl query string, customizing the figure labels with --xlabel and --ylabel. Additionally, the default line name in the legend now prefixes with the wandb project and entity. For example, a2c from SB3 will be prefixed with openrlbenchmark/sb3/a2c.

python -m openrlbenchmark.rlops \ --filters '?we=openrlbenchmark&wpn=sb3&ceik=env&cen=algo&metric=rollout/ep_rew_mean' \ 'a2c' \ 'ddpg' \ 'ppo_lstm?cl=PPO w/ LSTM' \ 'sac' \ 'td3' \ 'ppo' \ 'trpo' \ --filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' \ 'sac_continuous_action?tag=rlops-pilot&cl=SAC' \ --env-ids HalfCheetahBulletEnv-v0 \ --ncols 1 \ --ncols-legend 2 \ --xlabel 'Training Steps' \ --ylabel 'Episodic Return' \ --output-filename compare

generates

| cleanrl vs. Stable Baselines 3 | cleanrl vs. Stable Baselines 3 (Time) | |:----------------------------------:|:----------------------------------------:| | | |

  • Glow up README by @qgallouedec in https://github.com/openrlbenchmark/openrlbenchmark/pull/8

New Contributors

  • @qgallouedec made their first contribution in https://github.com/openrlbenchmark/openrlbenchmark/pull/8

Full Changelog: https://github.com/openrlbenchmark/openrlbenchmark/compare/v0.1.1b0...v0.1.1b2

- Python
Published by vwxyzjn over 3 years ago

openrlbenchmark - 🔥 v0.1.1b0 Beta Release

Excited to announce our first beta release of openrlbenchmark, a tool to help you grab metrics from popular RL libraries, such as SB3, CleanRL, baselines, Tianshou, etc.

Here is an example snippet. You can open it at Open In Colab

bash pip install openrlbenchmark python -m openrlbenchmark.rlops \ --filters '?we=openrlbenchmark&wpn=sb3&ceik=env&cen=algo&metric=rollout/ep_rew_mean' \ 'a2c' \ 'ddpg' \ 'ppo_lstm' \ 'sac' \ 'td3' \ 'ppo' \ 'trpo' \ --filters '?we=openrlbenchmark&wpn=cleanrl&ceik=env_id&cen=exp_name&metric=charts/episodic_return' \ 'sac_continuous_action?tag=rlops-pilot' \ --env-ids HalfCheetahBulletEnv-v0 \ --ncols 1 \ --ncols-legend 2 \ --output-filename compare.png \ --report

which generates

What happened?

The idea is to use filters-like syntax to grab the metrics of interest. Here, we created multiple filters. The first string in the first filter is '?we=openrlbenchmark&wpn=sb3&ceik=env&cen=algo&metric=rollout/ep_rew_mean', which is a query string that specifies the following:

  • we: the W&B entity name
  • wpn: the W&B project name
  • ceik: the custom key for the environment id
  • cen: the custom key for the experiment name
  • metric: the metric we are interested in

So we are fetching metrics from https://wandb.ai/openrlbenchmark/sb3. The environment id is stored in the env key, and the experiment name is stored in the algo key. The metric we are interested in is rollout/ep_rew_mean.

Similarly, we are fetching metrics from https://wandb.ai/openrlbenchmark/cleanrl. The environment id is stored in the env_id key, and the experiment name is stored in the exp_name key. The metric we are interested in is charts/episodic_return.

More docs and examples

See more examples and docs at https://github.com/openrlbenchmark/openrlbenchmark

Notably, https://github.com/openrlbenchmark/openrlbenchmark#currently-supported-libraries has the list of supported libraries, and below are some examples of the plots.

image

image

More info

Please check out the following links for more info.

What's going on right now?

This is a project we are slowly working on. There is no specific timeline or roadmap, but if you want to get involved. Feel free to reach out to me or open an issue. We are looking for volunteers to help us with the following:

  • Add experiments from other libraries
  • Run more experiments from currently supported libraries
  • Documentation and designing standards
  • Download the tensorboard metrics from the tracked experiments and load them locally to save time

- Python
Published by vwxyzjn over 3 years ago

openrlbenchmark - Initial v0.0.1 release

What's Changed

  • [WIP] Prototype plotting utility by @vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/2
  • Plot by @vwxyzjn in https://github.com/openrlbenchmark/openrlbenchmark/pull/3

New Contributors

  • @vwxyzjn made their first contribution in https://github.com/openrlbenchmark/openrlbenchmark/pull/2

Full Changelog: https://github.com/openrlbenchmark/openrlbenchmark/commits/v0.0.1

- Python
Published by vwxyzjn over 3 years ago