Recent Releases of sb3-contrib
sb3-contrib - v2.7.0: Added support for n-step returns for off-policy algorithms
Breaking Changes
- Upgraded to SB3 >= 2.7.0
New features
- Add n-step returns support with
n_stepsparameter
Bug fixes
- Use the
FloatScheduleandLinearScheduleclasses instead of lambdas in the ARS, PPO, and QRDQN implementations to improve model portability across different operating systems
New Contributors
- @akanto made their first contribution in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/294
Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.6.0...v2.7.0
- Python
Published by araffin 7 months ago
sb3-contrib - v2.6.0: Fix for `MaskablePPO` with `SubprocVecEnv`, add Gymnasium v1.1 support
Breaking Changes:
- Upgraded to Stable-Baselines3 >= 2.6.0
- Renamed
_dump_logs()todump_logs()
New Features:
- Added support for Gymnasium v1.1.0
Bug Fixes:
- Fixed issues with
SubprocVecEnvandMaskablePPOby usingvec_env.has_attr()(pickling issues, mask function not present)
Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.5.0...v2.6.0
- Python
Published by araffin 11 months ago
sb3-contrib - SB3-Contrib v2.5.0: NumPy v2.0 support
Breaking changes:
- Upgraded to PyTorch 2.3.0
- Dropped Python 3.8 support
- Upgraded to Stable-Baselines3 >= 2.5.0
New Contributors
- @kplers made their first contribution in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/266
Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.4.0...v2.5.0
- Python
Published by araffin about 1 year ago
sb3-contrib - SB3-Contrib v2.4.0: New algorithm (CrossQ), Gymnasium v1.0 support
Breaking Changes:
- Upgraded to Stable-Baselines3 >= 2.4.0
New Features:
- Added
CrossQalgorithm, from "Batch Normalization in Deep Reinforcement Learning" paper (@danielpalen) - Added
BatchRenormPyTorch layer used inCrossQ(@danielpalen) - Added support for Gymnasium v1.0
Bug Fixes:
- Updated QR-DQN optimizer input to only include quantile_net parameters (@corentinlger)
- Updated QR-DQN paper link in docs (@corentinlger)
- Fixed a warning with PyTorch 2.4 when loading a
RecurrentPPOmodel (You are using torch.load with weights_only=False) - Fixed loading QRDQN changes
target_update_interval(@jak3122)
Others:
- Updated PyTorch version on CI to 2.3.1
- Remove unnecessary SDE noise resampling in PPO/TRPO update
- Switched to uv to download packages on GitHub CI
New Contributors
- @corentinlger made their first contribution in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/252
- @jak3122 made their first contribution in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/259
- @danielpalen made their first contribution in https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/243
Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.3.0...v2.4.0
- Python
Published by araffin over 1 year ago
sb3-contrib - SB3-Contrib v2.3.0: New defaults hyperparameters for QR-DQN
Breaking Changes:
- Upgraded to Stable-Baselines3 >= 2.3.0
- The default
learning_startsparameter ofQRDQNhave been changed to be consistent with the other offpolicy algorithms
```python
SB3 < 2.3.0 default hyperparameters, 50_000 corresponded to Atari defaults hyperparameters
model = QRDQN("MlpPolicy", env, learningstarts=50000)
SB3 >= 2.3.0:
model = QRDQN("MlpPolicy", env, learning_starts=100) ```
New Features:
- Added
rollout_buffer_classandrollout_buffer_kwargsarguments to MaskablePPO - Log success rate
rollout/success_ratewhen available for on policy algorithms
Others:
- Fixed
train_freqtype annotation for tqc and qrdqn (@Armandpl) - Fixed
sb3_contrib/common/maskable/*.pytype annotations - Fixed
sb3_contrib/ppo_mask/ppo_mask.pytype annotations - Fixed
sb3_contrib/common/vec_env/async_eval.pytype annotations
Documentation:
- Add some additional notes about
MaskablePPO(evaluation and multi-process) (@icheered)
Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.2.1...v2.3.0
- Python
Published by araffin almost 2 years ago
sb3-contrib - SB3-Contrib v2.2.1
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
Breaking Changes:
- Upgraded to Stable-Baselines3 >= 2.2.1
- Switched to
rufffor sorting imports (isort is no longer needed), black and ruff version now require a minimum version - Dropped
x is Falsein favor ofnot x, which means that callbacks that wrongly returned None (instead of a boolean) will cause the training to stop (@iwishiwasaneagle)
New Features:
- Added
set_optionsforAsyncEval - Added
rollout_buffer_classandrollout_buffer_kwargsarguments to TRPO
Others:
- Fixed
ActorCriticPolicy.extract_features()signature by adding an optionalfeatures_extractorargument - Update dependencies (accept newer Shimmy/Sphinx version and remove
sphinx_autodoc_typehints)
- Python
Published by araffin over 2 years ago
sb3-contrib - SB3-Contrib v2.1.0
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
Breaking Changes:
- Removed Python 3.7 support
- SB3 now requires PyTorch >= 1.13
- Upgraded to Stable-Baselines3 >= 2.1.0
New Features:
- Added Python 3.11 support
Bug Fixes:
- Fixed MaskablePPO ignoring
stats_window_sizeargument
Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v2.0.0...v2.1.0
- Python
Published by araffin over 2 years ago
sb3-contrib - SB3-Contrib v2.0.0: Gymnasium Support
Warning Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023). We highly recommended you to upgrade to Python >= 3.8.
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Breaking Changes
- Switched to Gymnasium as primary backend, Gym 0.21 and 0.26 are still supported via the
shimmypackage (@carlosluis, @arjun-kg, @tlpss) - Upgraded to Stable-Baselines3 >= 2.0.0
Bug fixes
- Fixed QRDQN update interval for multi envs
Others
- Fixed
sb3_contrib/tqc/*.pytype hints - Fixed
sb3_contrib/trpo/*.pytype hints - Fixed
sb3_contrib/common/envs/invalid_actions_env.pytype hints
Full Changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/v1.8.0...v2.0.0
- Python
Published by araffin over 2 years ago
sb3-contrib - SB3-Contrib v1.8.0
Warning Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend. Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs). You can find a migration guide here. If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Breaking Changes:
- Removed shared layers in
mlp_extractor(@AlexPasqua) - Upgraded to Stable-Baselines3 >= 1.8.0
New Features:
- Added
stats_window_sizeargument to control smoothing in rollout logging (@jonasreiher)
Bug Fixes:
Deprecations:
Others:
- Moved to pyproject.toml
- Added github issue forms
- Fixed Atari Roms download in CI
- Fixed
sb3_contrib/qrdqn/*.pytype hints - Switched from
flake8toruff
Documentation:
- Added warning about potential crashes caused by
check_envin theMaskablePPOdocs (@AlexPasqua)
- Python
Published by araffin almost 3 years ago
sb3-contrib - SB3-Contrib v1.7.0 : Bug fixes for PPO LSTM and quality of life improvements
Warning Shared layers in MLP policy (
mlp_extractor) are now deprecated for PPO, A2C and TRPO. This feature will be removed in SB3 v1.8.0 and the behavior ofnet_arch=[64, 64]will create separate networks with the same architecture, to be consistent with the off-policy algorithms.Note TRPO models saved with SB3 < 1.7.0 will show a warning about missing keys in the state dict when loaded with SB3 >= 1.7.0. To suppress the warning, simply save the model again. You can find more info in issue # 1233
Breaking Changes:
- Removed deprecated
create_eval_env,eval_env,eval_log_path,n_eval_episodesandeval_freqparameters, please use anEvalCallbackinstead - Removed deprecated
sde_net_archparameter - Upgraded to Stable-Baselines3 >= 1.7.0
New Features:
- Introduced mypy type checking
- Added support for Python 3.10
- Added
with_biasparameter toARSPolicy - Added option to have non-shared features extractor between actor and critic in on-policy algorithms (@AlexPasqua)
- Features extractors now properly support unnormalized image-like observations (3D tensor)
when passing
normalize_images=False
Bug Fixes:
- Fixed a bug in
RecurrentPPOwhere the lstm states where incorrectly reshaped forn_lstm_layers > 1(thanks @kolbytn) - Fixed
RuntimeError: rnn: hx is not contiguouswhile predicting terminal values forRecurrentPPOwhenn_lstm_layers > 1
Deprecations:
- You should now explicitely pass a
features_extractorparameter when callingextract_features() - Deprecated shared layers in
MlpExtractor(@AlexPasqua)
Others:
- Fixed flake8 config
- Fixed
sb3_contrib/common/utils.pytype hint - Fixed
sb3_contrib/common/recurrent/type_aliases.pytype hint - Fixed
sb3_contrib/ars/policies.pytype hint - Exposed modules in
__init__.pywith__all__attribute (@ZikangXiong) - Removed ignores on Flake8 F401 (@ZikangXiong)
- Upgraded GitHub CI/setup-python to v4 and checkout to v3
- Set tensors construction directly on the device
- Standardized the use of
from gym import spaces
- Python
Published by araffin about 3 years ago
sb3-contrib - SB3-Contrib v1.6.2: Progress bar
Breaking Changes:
- Upgraded to Stable-Baselines3 >= 1.6.2
New Features:
- Added
progress_barargument in thelearn()method, displayed using TQDM and rich packages
Deprecations:
- Deprecate parameters
eval_env,eval_freqandcreate_eval_env
Others:
- Fixed the return type of
.load()methods so that they now useTypeVar
- Python
Published by araffin over 3 years ago
sb3-contrib - SB3-Contrib v1.6.1: Bug fix release
Breaking Changes:
- Fixed the issue that
predictdoes not always return action asnp.ndarray(@qgallouedec) - Upgraded to Stable-Baselines3 >= 1.6.1
Bug Fixes:
- Fixed the issue of wrongly passing policy arguments when using CnnLstmPolicy or MultiInputLstmPolicy with
RecurrentPPO(@mlodel) - Fixed division by zero error when computing FPS when a small number of time has elapsed in operating systems with low-precision timers.
- Fixed calling child callbacks in MaskableEvalCallback (@CppMaster)
- Fixed missing verbose parameter passing in the
MaskableEvalCallbackconstructor (@burakdmb) - Fixed the issue that when updating the target network in QRDQN, TQC, the
running_meanandrunning_varproperties of batch norm layers are not updated (@honglu2875)
Others:
- Changed the default buffer device from
"cpu"to"auto"
- Python
Published by araffin over 3 years ago
sb3-contrib - sb3-contrib v1.6.0: RecurrentPPO (aka PPO LSTM) and better defaults for learning from pixels with offpolicy algos
Breaking changes:
- Upgraded to Stable-Baselines3 >= 1.6.0
- Changed the way policy "aliases" are handled ("MlpPolicy", "CnnPolicy", ...), removing the former
register_policyhelper,policy_baseparameter and usingpolicy_aliasesstatic attributes instead (@Gregwar) - Renamed
rollout/exploration ratekey torollout/exploration_ratefor QRDQN (to be consistent with SB3 DQN) - Upgraded to python 3.7+ syntax using
pyupgrade - SB3 now requires PyTorch >= 1.11
- Changed the default network architecture when using
CnnPolicyorMultiInputPolicywith TQC,share_features_extractoris now set to False by default and thenet_arch=[256, 256](instead ofnet_arch=[]that was before)
New Features
- Added
RecurrentPPO(aka PPO LSTM)
Bug Fixes:
- Fixed a bug in
RecurrentPPOwhen calculating the masked loss functions (@rnederstigt) - Fixed a bug in
TRPOwhere kl divergence was not implemented forMultiDiscretespace
- Python
Published by araffin over 3 years ago
sb3-contrib - sb3-contrib v1.5.0: Bug fixes and newer gym version
Breaking Changes:
- Switched minimum Gym version to 0.21.0.
- Upgraded to Stable-Baselines3 >= 1.5.0
New Features:
- Allow PPO to turn of advantage normalization (see PR #61) @vwxyzjn
Bug Fixes:
- Removed explict calls to
forward()method as per pytorch guidelines
- Python
Published by araffin almost 4 years ago
sb3-contrib - sb3-contrib v1.4.0: Trust Region Policy Optimization (TRPO) and Augmented Random Search (ARS) algorithms
Breaking Changes:
- Dropped python 3.6 support
- Upgraded to Stable-Baselines3 >= 1.4.0
MaskablePPOwas updated to match latest SB3PPOversion (timeout handling and new method for the policy object)
New Features:
- Added
TRPO(@cyprienc) - Added experimental support to train off-policy algorithms with multiple envs (note:
HerReplayBuffercurrently not supported) - Added Augmented Random Search (ARS) (@sgillen)
Others:
- Improve test coverage for
MaskablePPO
- Python
Published by araffin about 4 years ago
sb3-contrib - sb3-contrib v1.3.0 : PPO with invalid action masking
WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021). We highly recommended you to upgrade to Python >= 3.7.
Breaking Changes:
- Removed
sde_net_arch - Upgraded to Stable-Baselines3 >= 1.3.0
New Features:
- Added
MaskablePPOalgorithm (@kronion) MaskablePPODictionary Observation support (@glmcdona)
- Python
Published by araffin over 4 years ago
sb3-contrib - sb3-contrib v1.2.0 : Train/Eval mode support
Breaking Changes:
- Upgraded to Stable-Baselines3 >= 1.2.0
Bug Fixes:
- QR-DQN and TQC updated so that their policies are switched between train and eval mode at the correct time (@ayeright)
Others:
- Fixed type annotation
- Added python 3.9 to CI
- Python
Published by araffin over 4 years ago
sb3-contrib - SB3 v1.1.0: dictionary observation support and timeout handling
Breaking Changes
- Added support for Dictionary observation spaces (cf. SB3 doc)
- Upgraded to Stable-Baselines3 >= 1.1.0
- Added proper handling of timeouts for off-policy algorithms (cf. SB3 doc)
- Updated usage of logger (cf. SB3 doc)
Bug Fixes
- Removed unused code in
TQC
Others
- SB3 docs and tests dependencies are no longer required for installing SB3 contrib
Documentation
- updated QR-DQN docs checkmark typo (@minhlong94)
- Python
Published by araffin over 4 years ago
sb3-contrib - Stable-Baselines3 v1.0
Blog post: https://araffin.github.io/post/sb3/
Breaking Changes
- Upgraded to Stable-Baselines3 v1.0
Bug Fixes
- Fixed a bug with
QR-DQNpredict method when usingdeterministic=Falsewith image space
- Python
Published by araffin almost 5 years ago
sb3-contrib - QR-DQN, SB3 upgrade and time feature wrapper
Breaking Changes:
- Upgraded to Stable-Baselines3 >= 0.11.1
New Features:
- Added
TimeFeatureWrapperto the wrappers - Added
QR-DQNalgorithm (@ku2482_)
Bug Fixes:
- Fixed bug in
TQCwhen saving/loading the policy only with non-default number of quantiles - Fixed bug in
QR-DQNwhen calculating the target quantiles (@ku2482, @guyk1971)
Others:
- Updated
TQCto match new SB3 version - Moved
quantile_huber_losstocommon/utils.py(@ku2482)
- Python
Published by araffin almost 5 years ago