sb3-contrib - v2.7.0: Added support for n-step returns for off-policy algorithms

Breaking Changes

Upgraded to SB3 >= 2.7.0

New features

Add n-step returns support with n_steps parameter

Bug fixes

Use the FloatSchedule and LinearSchedule classes instead of lambdas in the ARS, PPO, and QRDQN implementations to improve model portability across different operating systems

New Contributors

@akanto made their first contribution in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/294

Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.6.0...v2.7.0

- Python
Published by araffin 12 months ago

sb3-contrib - v2.6.0: Fix for `MaskablePPO` with `SubprocVecEnv`, add Gymnasium v1.1 support

Breaking Changes:

Upgraded to Stable-Baselines3 >= 2.6.0
Renamed _dump_logs() to dump_logs()

New Features:

Added support for Gymnasium v1.1.0

Bug Fixes:

Fixed issues with SubprocVecEnv and MaskablePPO by using vec_env.has_attr() (pickling issues, mask function not present)

Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.5.0...v2.6.0

- Python
Published by araffin over 1 year ago

sb3-contrib - SB3-Contrib v2.5.0: NumPy v2.0 support

Breaking changes:

Upgraded to PyTorch 2.3.0
Dropped Python 3.8 support
Upgraded to Stable-Baselines3 >= 2.5.0

New Contributors

@kplers made their first contribution in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/266

Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.4.0...v2.5.0

- Python
Published by araffin over 1 year ago

sb3-contrib - SB3-Contrib v2.4.0: New algorithm (CrossQ), Gymnasium v1.0 support

Breaking Changes:

Upgraded to Stable-Baselines3 >= 2.4.0

New Features:

Added CrossQ algorithm, from "Batch Normalization in Deep Reinforcement Learning" paper (@danielpalen)
Added BatchRenorm PyTorch layer used in CrossQ (@danielpalen)
Added support for Gymnasium v1.0

Bug Fixes:

Updated QR-DQN optimizer input to only include quantile_net parameters (@corentinlger)
Updated QR-DQN paper link in docs (@corentinlger)
Fixed a warning with PyTorch 2.4 when loading a RecurrentPPO model (You are using torch.load with weights_only=False)
Fixed loading QRDQN changes target_update_interval (@jak3122)

Others:

Updated PyTorch version on CI to 2.3.1
Remove unnecessary SDE noise resampling in PPO/TRPO update
Switched to uv to download packages on GitHub CI

New Contributors

@corentinlger made their first contribution in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/252
@jak3122 made their first contribution in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/259
@danielpalen made their first contribution in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/243

Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.3.0...v2.4.0

- Python
Published by araffin over 1 year ago

sb3-contrib - SB3-Contrib v2.3.0: New defaults hyperparameters for QR-DQN

Breaking Changes:

Upgraded to Stable-Baselines3 >= 2.3.0
The default learning_starts parameter of QRDQN have been changed to be consistent with the other offpolicy algorithms

```python

SB3 < 2.3.0 default hyperparameters, 50_000 corresponded to Atari defaults hyperparameters

model = QRDQN("MlpPolicy", env, learningstarts=50000)

SB3 >= 2.3.0:

model = QRDQN("MlpPolicy", env, learning_starts=100) ```

New Features:

Added rollout_buffer_class and rollout_buffer_kwargs arguments to MaskablePPO
Log success rate rollout/success_rate when available for on policy algorithms

Others:

Fixed train_freq type annotation for tqc and qrdqn (@Armandpl)
Fixed sb3_contrib/common/maskable/*.py type annotations
Fixed sb3_contrib/ppo_mask/ppo_mask.py type annotations
Fixed sb3_contrib/common/vec_env/async_eval.py type annotations

Documentation:

Add some additional notes about MaskablePPO (evaluation and multi-process) (@icheered)

Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.2.1...v2.3.0

- Python
Published by araffin over 2 years ago

sb3-contrib - SB3-Contrib v2.2.1

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

Breaking Changes:

Upgraded to Stable-Baselines3 >= 2.2.1
Switched to ruff for sorting imports (isort is no longer needed), black and ruff version now require a minimum version
Dropped x is False in favor of not x, which means that callbacks that wrongly returned None (instead of a boolean) will cause the training to stop (@iwishiwasaneagle)

New Features:

Added set_options for AsyncEval
Added rollout_buffer_class and rollout_buffer_kwargs arguments to TRPO

Others:

Fixed ActorCriticPolicy.extract_features() signature by adding an optional features_extractor argument
Update dependencies (accept newer Shimmy/Sphinx version and remove sphinx_autodoc_typehints)

- Python
Published by araffin over 2 years ago

sb3-contrib - SB3-Contrib v2.1.0

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

Breaking Changes:

Removed Python 3.7 support
SB3 now requires PyTorch >= 1.13
Upgraded to Stable-Baselines3 >= 2.1.0

New Features:

Added Python 3.11 support

Bug Fixes:

Fixed MaskablePPO ignoring stats_window_size argument

Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.0.0...v2.1.0

- Python
Published by araffin almost 3 years ago

sb3-contrib - SB3-Contrib v2.0.0: Gymnasium Support

Warning Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023). We highly recommended you to upgrade to Python >= 3.8.

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade: pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade or simply (rl zoo depends on SB3 and SB3 contrib): pip install rl_zoo3 --upgrade

Breaking Changes

Switched to Gymnasium as primary backend, Gym 0.21 and 0.26 are still supported via the shimmy package (@carlosluis, @arjun-kg, @tlpss)
Upgraded to Stable-Baselines3 >= 2.0.0

Bug fixes

Fixed QRDQN update interval for multi envs

Others

Fixed sb3_contrib/tqc/*.py type hints
Fixed sb3_contrib/trpo/*.py type hints
Fixed sb3_contrib/common/envs/invalid_actions_env.py type hints

Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v1.8.0...v2.0.0

- Python
Published by araffin about 3 years ago

sb3-contrib - SB3-Contrib v1.8.0

Warning Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend. Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs). You can find a migration guide here. If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.

RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo

To upgrade: pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade or simply (rl zoo depends on SB3 and SB3 contrib): pip install rl_zoo3 --upgrade

Breaking Changes:

Removed shared layers in mlp_extractor (@AlexPasqua)
Upgraded to Stable-Baselines3 >= 1.8.0

New Features:

Added stats_window_size argument to control smoothing in rollout logging (@jonasreiher)

Bug Fixes:

Deprecations:

Others:

Moved to pyproject.toml
Added github issue forms
Fixed Atari Roms download in CI
Fixed sb3_contrib/qrdqn/*.py type hints
Switched from flake8 to ruff

Documentation:

Added warning about potential crashes caused by check_env in the MaskablePPO docs (@AlexPasqua)

- Python
Published by araffin over 3 years ago

sb3-contrib - SB3-Contrib v1.7.0 : Bug fixes for PPO LSTM and quality of life improvements

Warning Shared layers in MLP policy (mlp_extractor) are now deprecated for PPO, A2C and TRPO. This feature will be removed in SB3 v1.8.0 and the behavior of net_arch=[64, 64] will create separate networks with the same architecture, to be consistent with the off-policy algorithms.

Note TRPO models saved with SB3 < 1.7.0 will show a warning about missing keys in the state dict when loaded with SB3 >= 1.7.0. To suppress the warning, simply save the model again. You can find more info in issue # 1233

Breaking Changes:

Removed deprecated create_eval_env, eval_env, eval_log_path, n_eval_episodes and eval_freq parameters, please use an EvalCallback instead
Removed deprecated sde_net_arch parameter
Upgraded to Stable-Baselines3 >= 1.7.0

New Features:

Introduced mypy type checking
Added support for Python 3.10
Added with_bias parameter to ARSPolicy
Added option to have non-shared features extractor between actor and critic in on-policy algorithms (@AlexPasqua)
Features extractors now properly support unnormalized image-like observations (3D tensor) when passing normalize_images=False

Bug Fixes:

Fixed a bug in RecurrentPPO where the lstm states where incorrectly reshaped for n_lstm_layers > 1 (thanks @kolbytn)
Fixed RuntimeError: rnn: hx is not contiguous while predicting terminal values for RecurrentPPO when n_lstm_layers > 1

Deprecations:

You should now explicitely pass a features_extractor parameter when calling extract_features()
Deprecated shared layers in MlpExtractor (@AlexPasqua)

Others:

Fixed flake8 config
Fixed sb3_contrib/common/utils.py type hint
Fixed sb3_contrib/common/recurrent/type_aliases.py type hint
Fixed sb3_contrib/ars/policies.py type hint
Exposed modules in __init__.py with __all__ attribute (@ZikangXiong)
Removed ignores on Flake8 F401 (@ZikangXiong)
Upgraded GitHub CI/setup-python to v4 and checkout to v3
Set tensors construction directly on the device
Standardized the use of from gym import spaces

- Python
Published by araffin over 3 years ago

sb3-contrib - SB3-Contrib v1.6.2: Progress bar

Breaking Changes:

Upgraded to Stable-Baselines3 >= 1.6.2

New Features:

Added progress_bar argument in the learn() method, displayed using TQDM and rich packages

Deprecations:

Deprecate parameters eval_env, eval_freq and create_eval_env

Others:

Fixed the return type of .load() methods so that they now use TypeVar

- Python
Published by araffin almost 4 years ago

sb3-contrib - SB3-Contrib v1.6.1: Bug fix release

Breaking Changes:

Fixed the issue that predict does not always return action as np.ndarray (@qgallouedec)
Upgraded to Stable-Baselines3 >= 1.6.1

Bug Fixes:

Fixed the issue of wrongly passing policy arguments when using CnnLstmPolicy or MultiInputLstmPolicy with RecurrentPPO (@mlodel)
Fixed division by zero error when computing FPS when a small number of time has elapsed in operating systems with low-precision timers.
Fixed calling child callbacks in MaskableEvalCallback (@CppMaster)
Fixed missing verbose parameter passing in the MaskableEvalCallback constructor (@burakdmb)
Fixed the issue that when updating the target network in QRDQN, TQC, the running_mean and running_var properties of batch norm layers are not updated (@honglu2875)

Others:

Changed the default buffer device from "cpu" to "auto"

- Python
Published by araffin almost 4 years ago

sb3-contrib - sb3-contrib v1.6.0: RecurrentPPO (aka PPO LSTM) and better defaults for learning from pixels with offpolicy algos

Breaking changes:

Upgraded to Stable-Baselines3 >= 1.6.0
Changed the way policy "aliases" are handled ("MlpPolicy", "CnnPolicy", ...), removing the former register_policy helper, policy_base parameter and using policy_aliases static attributes instead (@Gregwar)
Renamed rollout/exploration rate key to rollout/exploration_rate for QRDQN (to be consistent with SB3 DQN)
Upgraded to python 3.7+ syntax using pyupgrade
SB3 now requires PyTorch >= 1.11
Changed the default network architecture when using CnnPolicy or MultiInputPolicy with TQC, share_features_extractor is now set to False by default and the net_arch=[256, 256] (instead of net_arch=[] that was before)

New Features

Added RecurrentPPO (aka PPO LSTM)

Bug Fixes:

Fixed a bug in RecurrentPPO when calculating the masked loss functions (@rnederstigt)
Fixed a bug in TRPO where kl divergence was not implemented for MultiDiscrete space

- Python
Published by araffin about 4 years ago

sb3-contrib - sb3-contrib v1.5.0: Bug fixes and newer gym version

Breaking Changes:

Switched minimum Gym version to 0.21.0.
Upgraded to Stable-Baselines3 >= 1.5.0

New Features:

Allow PPO to turn of advantage normalization (see PR #61) @vwxyzjn

Bug Fixes:

Removed explict calls to forward() method as per pytorch guidelines

- Python
Published by araffin over 4 years ago

sb3-contrib - sb3-contrib v1.4.0: Trust Region Policy Optimization (TRPO) and Augmented Random Search (ARS) algorithms

Breaking Changes:

Dropped python 3.6 support
Upgraded to Stable-Baselines3 >= 1.4.0
MaskablePPO was updated to match latest SB3 PPO version (timeout handling and new method for the policy object)

New Features:

Added TRPO (@cyprienc)
Added experimental support to train off-policy algorithms with multiple envs (note: HerReplayBuffer currently not supported)
Added Augmented Random Search (ARS) (@sgillen)

Others:

Improve test coverage for MaskablePPO

- Python
Published by araffin over 4 years ago

sb3-contrib - sb3-contrib v1.3.0 : PPO with invalid action masking

WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021). We highly recommended you to upgrade to Python >= 3.7.

Breaking Changes:

Removed sde_net_arch
Upgraded to Stable-Baselines3 >= 1.3.0

New Features:

Added MaskablePPO algorithm (@kronion)
MaskablePPO Dictionary Observation support (@glmcdona)

- Python
Published by araffin over 4 years ago

sb3-contrib - sb3-contrib v1.2.0 : Train/Eval mode support

Breaking Changes:

Upgraded to Stable-Baselines3 >= 1.2.0

Bug Fixes:

QR-DQN and TQC updated so that their policies are switched between train and eval mode at the correct time (@ayeright)

Others:

Fixed type annotation
Added python 3.9 to CI

- Python
Published by araffin almost 5 years ago

sb3-contrib - SB3 v1.1.0: dictionary observation support and timeout handling

Breaking Changes

Added support for Dictionary observation spaces (cf. SB3 doc)
Upgraded to Stable-Baselines3 >= 1.1.0
Added proper handling of timeouts for off-policy algorithms (cf. SB3 doc)
Updated usage of logger (cf. SB3 doc)

Bug Fixes

Removed unused code in TQC

Others

SB3 docs and tests dependencies are no longer required for installing SB3 contrib

Documentation

updated QR-DQN docs checkmark typo (@minhlong94)

- Python
Published by araffin about 5 years ago

sb3-contrib - Stable-Baselines3 v1.0

Blog post: https://araffin.github.io/post/sb3/

Breaking Changes

Upgraded to Stable-Baselines3 v1.0

Bug Fixes

Fixed a bug with QR-DQN predict method when using deterministic=False with image space

- Python
Published by araffin over 5 years ago

sb3-contrib -

- Python
Published by araffin over 5 years ago

sb3-contrib - QR-DQN, SB3 upgrade and time feature wrapper

Breaking Changes:

Upgraded to Stable-Baselines3 >= 0.11.1

New Features:

Added TimeFeatureWrapper to the wrappers
Added QR-DQN algorithm (@ku2482_)

Bug Fixes:

Fixed bug in TQC when saving/loading the policy only with non-default number of quantiles
Fixed bug in QR-DQN when calculating the target quantiles (@ku2482, @guyk1971)

Others:

Updated TQC to match new SB3 version
Moved quantile_huber_loss to common/utils.py (@ku2482)

- Python
Published by araffin over 5 years ago

Recent Releases of sb3-contrib

sb3-contrib - v2.7.0: Added support for n-step returns for off-policy algorithms

Breaking Changes

New features

Bug fixes

New Contributors

sb3-contrib - v2.6.0: Fix for `MaskablePPO` with `SubprocVecEnv`, add Gymnasium v1.1 support

Breaking Changes:

New Features:

Bug Fixes:

sb3-contrib - SB3-Contrib v2.5.0: NumPy v2.0 support

Breaking changes:

New Contributors

sb3-contrib - SB3-Contrib v2.4.0: New algorithm (CrossQ), Gymnasium v1.0 support

Breaking Changes:

New Features:

Bug Fixes:

Others:

New Contributors

sb3-contrib - SB3-Contrib v2.3.0: New defaults hyperparameters for QR-DQN

Breaking Changes:

SB3 < 2.3.0 default hyperparameters, 50_000 corresponded to Atari defaults hyperparameters

model = QRDQN("MlpPolicy", env, learningstarts=50000)

SB3 >= 2.3.0:

New Features:

Others:

Documentation:

sb3-contrib - SB3-Contrib v2.2.1

Breaking Changes:

New Features:

Others:

sb3-contrib - SB3-Contrib v2.1.0

Breaking Changes:

New Features:

Bug Fixes:

sb3-contrib - SB3-Contrib v2.0.0: Gymnasium Support

Breaking Changes

Bug fixes

Others

sb3-contrib - SB3-Contrib v1.8.0

Breaking Changes:

New Features:

Bug Fixes:

Deprecations:

Others:

Documentation:

sb3-contrib - SB3-Contrib v1.7.0 : Bug fixes for PPO LSTM and quality of life improvements

Breaking Changes:

New Features:

Bug Fixes:

Deprecations:

Others:

sb3-contrib - SB3-Contrib v1.6.2: Progress bar

Breaking Changes:

New Features:

Deprecations:

Others:

sb3-contrib - SB3-Contrib v1.6.1: Bug fix release

Breaking Changes:

Bug Fixes:

Others:

sb3-contrib - sb3-contrib v1.6.0: RecurrentPPO (aka PPO LSTM) and better defaults for learning from pixels with offpolicy algos

Breaking changes:

New Features

Bug Fixes:

sb3-contrib - sb3-contrib v1.5.0: Bug fixes and newer gym version

Breaking Changes:

New Features:

Bug Fixes:

sb3-contrib - sb3-contrib v1.4.0: Trust Region Policy Optimization (TRPO) and Augmented Random Search (ARS) algorithms

Breaking Changes:

New Features:

Others:

sb3-contrib - sb3-contrib v1.3.0 : PPO with invalid action masking

Breaking Changes:

New Features:

sb3-contrib - sb3-contrib v1.2.0 : Train/Eval mode support

Breaking Changes:

Bug Fixes:

Others: