Recent Releases of sb3-contrib

sb3-contrib - v2.7.0: Added support for n-step returns for off-policy algorithms

Breaking Changes

  • Upgraded to SB3 >= 2.7.0

New features

  • Add n-step returns support with n_steps parameter

Bug fixes

  • Use the FloatSchedule and LinearSchedule classes instead of lambdas in the ARS, PPO, and QRDQN implementations to improve model portability across different operating systems

New Contributors

  • @akanto made their first contribution in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/294

Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.6.0...v2.7.0

- Python
Published by araffin 7 months ago

sb3-contrib - v2.6.0: Fix for `MaskablePPO` with `SubprocVecEnv`, add Gymnasium v1.1 support

Breaking Changes:

  • Upgraded to Stable-Baselines3 >= 2.6.0
  • Renamed _dump_logs() to dump_logs()

New Features:

  • Added support for Gymnasium v1.1.0

Bug Fixes:

  • Fixed issues with SubprocVecEnv and MaskablePPO by using vec_env.has_attr() (pickling issues, mask function not present)

Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.5.0...v2.6.0

- Python
Published by araffin 11 months ago

sb3-contrib - SB3-Contrib v2.5.0: NumPy v2.0 support

Breaking changes:

  • Upgraded to PyTorch 2.3.0
  • Dropped Python 3.8 support
  • Upgraded to Stable-Baselines3 >= 2.5.0

New Contributors

  • @kplers made their first contribution in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/266

Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.4.0...v2.5.0

- Python
Published by araffin about 1 year ago

sb3-contrib - SB3-Contrib v2.4.0: New algorithm (CrossQ), Gymnasium v1.0 support

Breaking Changes:

  • Upgraded to Stable-Baselines3 >= 2.4.0

New Features:

  • Added CrossQ algorithm, from "Batch Normalization in Deep Reinforcement Learning" paper (@danielpalen)
  • Added BatchRenorm PyTorch layer used in CrossQ (@danielpalen)
  • Added support for Gymnasium v1.0

Bug Fixes:

  • Updated QR-DQN optimizer input to only include quantile_net parameters (@corentinlger)
  • Updated QR-DQN paper link in docs (@corentinlger)
  • Fixed a warning with PyTorch 2.4 when loading a RecurrentPPO model (You are using torch.load with weights_only=False)
  • Fixed loading QRDQN changes target_update_interval (@jak3122)

Others:

  • Updated PyTorch version on CI to 2.3.1
  • Remove unnecessary SDE noise resampling in PPO/TRPO update
  • Switched to uv to download packages on GitHub CI

New Contributors

  • @corentinlger made their first contribution in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/252
  • @jak3122 made their first contribution in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/259
  • @danielpalen made their first contribution in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/243

Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.3.0...v2.4.0

- Python
Published by araffin over 1 year ago

sb3-contrib - SB3-Contrib v2.3.0: New defaults hyperparameters for QR-DQN

Breaking Changes:

  • Upgraded to Stable-Baselines3 >= 2.3.0
  • The default learning_starts parameter of QRDQN have been changed to be consistent with the other offpolicy algorithms

```python

SB3 < 2.3.0 default hyperparameters, 50_000 corresponded to Atari defaults hyperparameters

model = QRDQN("MlpPolicy", env, learningstarts=50000)

SB3 >= 2.3.0:

model = QRDQN("MlpPolicy", env, learning_starts=100) ```

New Features:

  • Added rollout_buffer_class and rollout_buffer_kwargs arguments to MaskablePPO
  • Log success rate rollout/success_rate when available for on policy algorithms

Others:

  • Fixed train_freq type annotation for tqc and qrdqn (@Armandpl)
  • Fixed sb3_contrib/common/maskable/*.py type annotations
  • Fixed sb3_contrib/ppo_mask/ppo_mask.py type annotations
  • Fixed sb3_contrib/common/vec_env/async_eval.py type annotations

Documentation:

  • Add some additional notes about MaskablePPO (evaluation and multi-process) (@icheered)

Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.2.1...v2.3.0

- Python
Published by araffin almost 2 years ago

sb3-contrib - SB3-Contrib v2.2.1

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

Breaking Changes:

  • Upgraded to Stable-Baselines3 >= 2.2.1
  • Switched to ruff for sorting imports (isort is no longer needed), black and ruff version now require a minimum version
  • Dropped x is False in favor of not x, which means that callbacks that wrongly returned None (instead of a boolean) will cause the training to stop (@iwishiwasaneagle)

New Features:

  • Added set_options for AsyncEval
  • Added rollout_buffer_class and rollout_buffer_kwargs arguments to TRPO

Others:

  • Fixed ActorCriticPolicy.extract_features() signature by adding an optional features_extractor argument
  • Update dependencies (accept newer Shimmy/Sphinx version and remove sphinx_autodoc_typehints)

- Python
Published by araffin over 2 years ago

sb3-contrib - SB3-Contrib v2.1.0

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

Breaking Changes:

  • Removed Python 3.7 support
  • SB3 now requires PyTorch >= 1.13
  • Upgraded to Stable-Baselines3 >= 2.1.0

New Features:

  • Added Python 3.11 support

Bug Fixes:

  • Fixed MaskablePPO ignoring stats_window_size argument

Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.0.0...v2.1.0

- Python
Published by araffin over 2 years ago

sb3-contrib - SB3-Contrib v2.0.0: Gymnasium Support

Warning Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023). We highly recommended you to upgrade to Python >= 3.8.

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade: pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade or simply (rl zoo depends on SB3 and SB3 contrib): pip install rl_zoo3 --upgrade

Breaking Changes

  • Switched to Gymnasium as primary backend, Gym 0.21 and 0.26 are still supported via the shimmy package (@carlosluis, @arjun-kg, @tlpss)
  • Upgraded to Stable-Baselines3 >= 2.0.0

Bug fixes

  • Fixed QRDQN update interval for multi envs

Others

  • Fixed sb3_contrib/tqc/*.py type hints
  • Fixed sb3_contrib/trpo/*.py type hints
  • Fixed sb3_contrib/common/envs/invalid_actions_env.py type hints

Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v1.8.0...v2.0.0

- Python
Published by araffin over 2 years ago

sb3-contrib - SB3-Contrib v1.8.0

Warning Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend. Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs). You can find a migration guide here. If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.

RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo

To upgrade: pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade or simply (rl zoo depends on SB3 and SB3 contrib): pip install rl_zoo3 --upgrade

Breaking Changes:

  • Removed shared layers in mlp_extractor (@AlexPasqua)
  • Upgraded to Stable-Baselines3 >= 1.8.0

New Features:

  • Added stats_window_size argument to control smoothing in rollout logging (@jonasreiher)

Bug Fixes:

Deprecations:

Others:

  • Moved to pyproject.toml
  • Added github issue forms
  • Fixed Atari Roms download in CI
  • Fixed sb3_contrib/qrdqn/*.py type hints
  • Switched from flake8 to ruff

Documentation:

  • Added warning about potential crashes caused by check_env in the MaskablePPO docs (@AlexPasqua)

- Python
Published by araffin almost 3 years ago

sb3-contrib - SB3-Contrib v1.7.0 : Bug fixes for PPO LSTM and quality of life improvements

Warning Shared layers in MLP policy (mlp_extractor) are now deprecated for PPO, A2C and TRPO. This feature will be removed in SB3 v1.8.0 and the behavior of net_arch=[64, 64] will create separate networks with the same architecture, to be consistent with the off-policy algorithms.

Note TRPO models saved with SB3 < 1.7.0 will show a warning about missing keys in the state dict when loaded with SB3 >= 1.7.0. To suppress the warning, simply save the model again. You can find more info in issue # 1233

Breaking Changes:

  • Removed deprecated create_eval_env, eval_env, eval_log_path, n_eval_episodes and eval_freq parameters, please use an EvalCallback instead
  • Removed deprecated sde_net_arch parameter
  • Upgraded to Stable-Baselines3 >= 1.7.0

New Features:

  • Introduced mypy type checking
  • Added support for Python 3.10
  • Added with_bias parameter to ARSPolicy
  • Added option to have non-shared features extractor between actor and critic in on-policy algorithms (@AlexPasqua)
  • Features extractors now properly support unnormalized image-like observations (3D tensor) when passing normalize_images=False

Bug Fixes:

  • Fixed a bug in RecurrentPPO where the lstm states where incorrectly reshaped for n_lstm_layers > 1 (thanks @kolbytn)
  • Fixed RuntimeError: rnn: hx is not contiguous while predicting terminal values for RecurrentPPO when n_lstm_layers > 1

Deprecations:

  • You should now explicitely pass a features_extractor parameter when calling extract_features()
  • Deprecated shared layers in MlpExtractor (@AlexPasqua)

Others:

  • Fixed flake8 config
  • Fixed sb3_contrib/common/utils.py type hint
  • Fixed sb3_contrib/common/recurrent/type_aliases.py type hint
  • Fixed sb3_contrib/ars/policies.py type hint
  • Exposed modules in __init__.py with __all__ attribute (@ZikangXiong)
  • Removed ignores on Flake8 F401 (@ZikangXiong)
  • Upgraded GitHub CI/setup-python to v4 and checkout to v3
  • Set tensors construction directly on the device
  • Standardized the use of from gym import spaces

- Python
Published by araffin about 3 years ago

sb3-contrib - SB3-Contrib v1.6.2: Progress bar

Breaking Changes:

  • Upgraded to Stable-Baselines3 >= 1.6.2

New Features:

  • Added progress_bar argument in the learn() method, displayed using TQDM and rich packages

Deprecations:

  • Deprecate parameters eval_env, eval_freq and create_eval_env

Others:

  • Fixed the return type of .load() methods so that they now use TypeVar

- Python
Published by araffin over 3 years ago

sb3-contrib - SB3-Contrib v1.6.1: Bug fix release

Breaking Changes:

  • Fixed the issue that predict does not always return action as np.ndarray (@qgallouedec)
  • Upgraded to Stable-Baselines3 >= 1.6.1

Bug Fixes:

  • Fixed the issue of wrongly passing policy arguments when using CnnLstmPolicy or MultiInputLstmPolicy with RecurrentPPO (@mlodel)
  • Fixed division by zero error when computing FPS when a small number of time has elapsed in operating systems with low-precision timers.
  • Fixed calling child callbacks in MaskableEvalCallback (@CppMaster)
  • Fixed missing verbose parameter passing in the MaskableEvalCallback constructor (@burakdmb)
  • Fixed the issue that when updating the target network in QRDQN, TQC, the running_mean and running_var properties of batch norm layers are not updated (@honglu2875)

Others:

  • Changed the default buffer device from "cpu" to "auto"

- Python
Published by araffin over 3 years ago

sb3-contrib - sb3-contrib v1.6.0: RecurrentPPO (aka PPO LSTM) and better defaults for learning from pixels with offpolicy algos

Breaking changes:

  • Upgraded to Stable-Baselines3 >= 1.6.0
  • Changed the way policy "aliases" are handled ("MlpPolicy", "CnnPolicy", ...), removing the former register_policy helper, policy_base parameter and using policy_aliases static attributes instead (@Gregwar)
  • Renamed rollout/exploration rate key to rollout/exploration_rate for QRDQN (to be consistent with SB3 DQN)
  • Upgraded to python 3.7+ syntax using pyupgrade
  • SB3 now requires PyTorch >= 1.11
  • Changed the default network architecture when using CnnPolicy or MultiInputPolicy with TQC, share_features_extractor is now set to False by default and the net_arch=[256, 256] (instead of net_arch=[] that was before)

New Features

  • Added RecurrentPPO (aka PPO LSTM)

Bug Fixes:

  • Fixed a bug in RecurrentPPO when calculating the masked loss functions (@rnederstigt)
  • Fixed a bug in TRPO where kl divergence was not implemented for MultiDiscrete space

- Python
Published by araffin over 3 years ago

sb3-contrib - sb3-contrib v1.5.0: Bug fixes and newer gym version

Breaking Changes:

  • Switched minimum Gym version to 0.21.0.
  • Upgraded to Stable-Baselines3 >= 1.5.0

New Features:

  • Allow PPO to turn of advantage normalization (see PR #61) @vwxyzjn

Bug Fixes:

  • Removed explict calls to forward() method as per pytorch guidelines

- Python
Published by araffin almost 4 years ago

sb3-contrib - sb3-contrib v1.4.0: Trust Region Policy Optimization (TRPO) and Augmented Random Search (ARS) algorithms

Breaking Changes:

  • Dropped python 3.6 support
  • Upgraded to Stable-Baselines3 >= 1.4.0
  • MaskablePPO was updated to match latest SB3 PPO version (timeout handling and new method for the policy object)

New Features:

  • Added TRPO (@cyprienc)
  • Added experimental support to train off-policy algorithms with multiple envs (note: HerReplayBuffer currently not supported)
  • Added Augmented Random Search (ARS) (@sgillen)

Others:

  • Improve test coverage for MaskablePPO

- Python
Published by araffin about 4 years ago

sb3-contrib - sb3-contrib v1.3.0 : PPO with invalid action masking

WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021). We highly recommended you to upgrade to Python >= 3.7.

Breaking Changes:

  • Removed sde_net_arch
  • Upgraded to Stable-Baselines3 >= 1.3.0

New Features:

  • Added MaskablePPO algorithm (@kronion)
  • MaskablePPO Dictionary Observation support (@glmcdona)

- Python
Published by araffin over 4 years ago

sb3-contrib - sb3-contrib v1.2.0 : Train/Eval mode support

Breaking Changes:

  • Upgraded to Stable-Baselines3 >= 1.2.0

Bug Fixes:

  • QR-DQN and TQC updated so that their policies are switched between train and eval mode at the correct time (@ayeright)

Others:

  • Fixed type annotation
  • Added python 3.9 to CI

- Python
Published by araffin over 4 years ago

sb3-contrib - SB3 v1.1.0: dictionary observation support and timeout handling

Breaking Changes

  • Added support for Dictionary observation spaces (cf. SB3 doc)
  • Upgraded to Stable-Baselines3 >= 1.1.0
  • Added proper handling of timeouts for off-policy algorithms (cf. SB3 doc)
  • Updated usage of logger (cf. SB3 doc)

Bug Fixes

  • Removed unused code in TQC

Others

  • SB3 docs and tests dependencies are no longer required for installing SB3 contrib

Documentation

  • updated QR-DQN docs checkmark typo (@minhlong94)

- Python
Published by araffin over 4 years ago

sb3-contrib - Stable-Baselines3 v1.0

Blog post: https://araffin.github.io/post/sb3/

Breaking Changes

  • Upgraded to Stable-Baselines3 v1.0

Bug Fixes

  • Fixed a bug with QR-DQN predict method when using deterministic=False with image space

- Python
Published by araffin almost 5 years ago

sb3-contrib -

- Python
Published by araffin almost 5 years ago

sb3-contrib - QR-DQN, SB3 upgrade and time feature wrapper

Breaking Changes:

  • Upgraded to Stable-Baselines3 >= 0.11.1

New Features:

  • Added TimeFeatureWrapper to the wrappers
  • Added QR-DQN algorithm (@ku2482_)

Bug Fixes:

  • Fixed bug in TQC when saving/loading the policy only with non-default number of quantiles
  • Fixed bug in QR-DQN when calculating the target quantiles (@ku2482, @guyk1971)

Others:

  • Updated TQC to match new SB3 version
  • Moved quantile_huber_loss to common/utils.py (@ku2482)

- Python
Published by araffin almost 5 years ago