Recent Releases of stable-baselines3
stable-baselines3 - v2.7.0: n-step returns for all off-policy algorithms via the `n_steps` argument
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
New Features:
- Added support for n-step returns for off-policy algorithms via the
n_stepsparameter ```python from stable_baselines3 import SAC
SAC with n-step returns
model = SAC("MlpPolicy", "Pendulum-v1", nsteps=3, verbose=1) model.learn(10000) ```
- Added
NStepReplayBufferthat allows to compute n-step returns without additional memory requirement (and without for loops) - Added Gymnasium v1.2 support
Bug Fixes:
- Fixed docker GPU image (PyTorch GPU was not installed)
- Fixed segmentation faults caused by non-portable schedules during model loading (@akanto)
SB3-Contrib
- Added support for n-step returns for off-policy algorithms via the
n_stepsparameter - Use the
FloatScheduleandLinearScheduleclasses instead of lambdas in the ARS, PPO, and QRDQN implementations to improve model portability across different operating systems
RL Zoo
linear_schedulenow returns aSimpleLinearScheduleobject for better portability- Renamed
LunarLander-v2toLunarLander-v3in hyperparameters - Renamed
CarRacing-v2toCarRacing-v3in hyperparameters - Docker GPU images are now working again
- Use
ConstantSchedule, andSimpleLinearScheduleinstead ofconstant_fnandlinear_schedule - Fixed
CarRacing-v3hyperparameters for newer Gymnasium version
SBX (SB3 + Jax)
- Added support for n-step returns for off-policy algorithms via the
n_stepsparameter - Added KL Adaptive LR for PPO and LR schedule for SAC/TQC
Deprecations:
get_schedule_fn(),get_linear_fn(),constant_fn()are deprecated, please useFloatSchedule(),LinearSchedule(),ConstantSchedule()instead
Documentation:
- Clarify
evaluate_policydocumentation - Added doc about training exceeding the
total_timestepsparameter - Updated
LunarLanderandLunarLanderContinuousenvironment versions to v3 (@j0m0k0) - Added sb3-extra-buffers to the project page (@Trenza1ore)
New Contributors
- @akanto made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/2125
- @omahs made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/2140
- @j0m0k0 made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/2143
- @leopardracer made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/2147
- @Trenza1ore made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/2157
Full Changelog: https://github.com/DLR-RM/stable-baselines3/compare/v2.6.0...v2.7.0
- Python
Published by araffin 10 months ago
stable-baselines3 - v2.6.0: New `LogEveryNTimesteps` callback and `has_attr` method, refactored hyperparameter optimization
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
New Features:
- Added
has_attrmethod forVecEnvto check if an attribute exists - Added
LogEveryNTimestepscallback to dump logs every N timesteps (note: you need to passlog_interval=Noneto avoid any interference) - Added Gymnasium v1.1 support
Bug fixes:
SubProcVecEnvwill now exit gracefully (without big traceback) when usingKeyboardInterrupt
SB3-Contrib
- Renamed
_dump_logs()todump_logs() - Fixed issues with
SubprocVecEnvandMaskablePPOby usingvec_env.has_attr()(pickling issues, mask function not present)
RL Zoo
- Refactored hyperparameter optimization. The Optuna Journal storage backend is now supported (recommended default) and you can easily load tuned hyperparameter via the new
--trial-idargument oftrain.py. - Save the exact command line used to launch a training
- Added support for special vectorized env (e.g. Brax, IsaacSim) by allowing to override the
VecEnvclass use to instantiate the env in theExperimentManager - Allow to disable auto-logging by passing
--log-interval -2(useful when logging things manually) - Added Gymnasium v1.1 support
- Fixed use of old HF api in
get_hf_trained_models()
SBX (SB3 + Jax)
- Updated PPO to support
net_arch, and additional fixes - Fixed entropy coeff wrongly logged for SAC and derivatives.
- Fixed PPO
predict()for env that were not normalized (action spaces with limits != [-1, 1]) - PPO now logs the standard deviation
Deprecations:
algo._dump_logs()is deprecated in favor ofalgo.dump_logs()and will be removed in SB3 v2.7.0
Others:
- Updated black from v24 to v25
- Improved error messages when checking Box space equality (loading
VecNormalize) - Updated test to reflect how
set_wrapper_attrshould be used now
Documentation:
- Clarify the use of Gym wrappers with
make_vec_envin the section on Vectorized Environments (@pstahlhofen) - Updated callback doc for
EveryNTimesteps - Added doc on how to set env attributes via
VecEnvcalls - Added ONNX export example for
MultiInputPolicy(@darkopetrovic)
New Contributors
- @pstahlhofen made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/2079
- @darkopetrovic made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/2098
Full Changelog: https://github.com/DLR-RM/stable-baselines3/compare/v2.5.0...v2.6.0
- Python
Published by araffin about 1 year ago
stable-baselines3 - v2.5.0: New algorithm (SimBa in SBX) and NumPy 2.0 support
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
Breaking Changes:
- Increased minimum required version of PyTorch to 2.3.0
- Removed support for Python 3.8
New Features:
- Added support for NumPy v2.0:
VecNormalizenow cast normalized rewards to float32, updated bit flipping env to avoid overflow issues too - Added official support for Python 3.12
SBX (SB3 + Jax)
- Added SimBa Policy: Simplicity Bias for Scaling Up Parameters in DRL
- Added support for parameter resets
Others:
- Updated Dockerfile
Documentation:
- Added Decisions and Dragons to resources. (@jmacglashan)
- Updated PyBullet example, now compatible with Gymnasium
- Added link to policies for
policy_kwargsparameter (@kplers) - Add FootstepNet Envs to the project page (@cgaspard3333)
- Added FRASA to the project page (@MarcDcls)
- Fixed atari example (@chrisgao99)
- Add a note about
Discreteaction spaces withstart!=0 - Update doc for massively parallel simulators (Isaac Lab, Brax, ...)
- Add dm_control example
New Contributors
- @jmacglashan made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/2044
- @kplers made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/2050
- @MarcDcls made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/2059
- @cgaspard3333 made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/2058
- @sanowl made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/2064
- @chrisgao99 made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/2071
Full Changelog: https://github.com/DLR-RM/stable-baselines3/compare/v2.4.0...v2.5.0
- Python
Published by araffin over 1 year ago
stable-baselines3 - Stable-Baselines3 v2.4.1: Fix for `VecVideoRecorder`
Bug Fixes
- Fixed a bug introduced in v2.4.0 where the
VecVideoRecorderwould override videos
Full Changelog: https://github.com/DLR-RM/stable-baselines3/compare/v2.4.0...v2.4.1
- Python
Published by araffin over 1 year ago
stable-baselines3 - Stable-Baselines3 v2.4.0: New algorithm (CrossQ in SB3-Contrib) and Gymnasium v1.0 support
[!WARNING] Stable-Baselines3 (SB3) v2.4.0 will be the last one supporting Python 3.8 (end of life in October 2024) and PyTorch < 2.3. We highly recommended you to upgrade to Python >= 3.9 and PyTorch >= 2.3 (compatible with NumPy v2).
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
[!NOTE] DQN (and QR-DQN) models saved with SB3 < 2.4.0 will show a warning about truncation of optimizer state when loaded with SB3 >= 2.4.0. To suppress the warning, simply save the model again. You can find more info in PR #1963
Breaking Changes:
- Increased minimum required version of Gymnasium to 0.29.1
New Features:
- Added support for
pre_linear_modulesandpost_linear_modulesincreate_mlp(useful for adding normalization layers, like in DroQ or CrossQ) - Enabled np.ndarray logging for TensorBoardOutputFormat as histogram (see GH#1634) (@iwishwasaneagle)
- Updated env checker to warn users when using multi-dim array to define
MultiDiscretespaces - Added support for Gymnasium v1.0
Bug Fixes:
- Fixed memory leak when loading learner from storage,
set_parameters()does not try to load the object data anymore and only loads the PyTorch parameters (@peteole) - Cast type in compute gae method to avoid error when using torch compile (@amjames)
CallbackListnow sets the.parentattribute of child callbacks to its own.parent. (will-maclean)- Fixed error when loading a model that has
net_archmanually set toNone(@jak3122) - Set requirement numpy<2.0 until PyTorch is compatible (https://github.com/pytorch/pytorch/issues/107302)
- Updated DQN optimizer input to only include qnetwork parameters, removing the targetq_network ones (@corentinlger)
- Fixed
test_buffers.py::test_devicewhich was not actually checking the device of tensors (@rhaps0dy)
SB3-Contrib
- Added
CrossQalgorithm, from "Batch Normalization in Deep Reinforcement Learning" paper (@danielpalen) - Added
BatchRenormPyTorch layer used inCrossQ(@danielpalen) - Updated QR-DQN optimizer input to only include quantile_net parameters (@corentinlger)
- Fixed loading QRDQN changes
target_update_interval(@jak3122)
RL Zoo
- Updated defaults hyperparameters for TQC/SAC for Swimmer-v4 (decrease gamma for more consistent results)
SBX (SB3 + Jax)
- Added CNN support for DQN
- Bug fix for SAC and related algorithms, optimize log of ent coeff to be consistent with SB3
Others:
- Fixed various typos (@cschindlbeck)
- Remove unnecessary SDE noise resampling in PPO update (@brn-dev)
- Updated PyTorch version on CI to 2.3.1
- Added a warning to recommend using CPU with on policy algorithms (A2C/PPO) and
MlpPolicy - Switched to uv to download packages faster on GitHub CI
- Updated dependencies for read the doc
- Removed unnecessary
copy_obs_dictmethod forSubprocVecEnv, remove the use of ordered dict and renameflatten_obstostack_obs
Documentation:
- Updated PPO doc to recommend using CPU with
MlpPolicy - Clarified documentation about planned features and citing software
- Added a note about the fact we are optimizing log of ent coeff for SAC
New Contributors
- @amjames made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1922
- @cschindlbeck made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1926
- @peteole made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1908
- @jak3122 made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1937
- @will-maclean made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1939
- @brn-dev made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1933
- @chsahit made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1962
- @Dev1nW made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/2017
Full Changelog: https://github.com/DLR-RM/stable-baselines3/compare/v2.3.2...v2.4.0
- Python
Published by araffin over 1 year ago
stable-baselines3 - Stable-Baselines3 v2.3.2: Hotfix for PyTorch 1.13
Bug fixes
- Reverted
torch.load()to be calledweights_only=Falseas it caused loading issue with old version of PyTorch. https://github.com/DLR-RM/stable-baselines3/pull/1913 - Cast learning_rate to float lambda for pickle safety when doing model.load by @markscsmith in https://github.com/DLR-RM/stable-baselines3/pull/1901
Documentation
- Fix typo in changelog by @araffin in https://github.com/DLR-RM/stable-baselines3/pull/1882
- Fixed broken link in ppo.rst by @chaitanyabisht in https://github.com/DLR-RM/stable-baselines3/pull/1884
- Adding ER-MRL to community project by @corentinlger in https://github.com/DLR-RM/stable-baselines3/pull/1904
- Fix tensorboad video slow numpy->torch conversion by @NickLucche in https://github.com/DLR-RM/stable-baselines3/pull/1910
New Contributors
- @chaitanyabisht made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1884
- @markscsmith made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1901
- @NickLucche made their first contribution in https://github.com/DLR-RM/stable-baselines3/pull/1910
Full Changelog: https://github.com/DLR-RM/stable-baselines3/compare/v2.3.0...v2.3.2
- Python
Published by araffin about 2 years ago
stable-baselines3 - Stable-Baselines3 v2.3.0: New defaults hyperparameters for DDPG, TD3 and DQN
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Breaking Changes:
- The defaults hyperparameters of
TD3andDDPGhave been changed to be more consistent withSAC
```python
# SB3 < 2.3.0 default hyperparameters # model = TD3("MlpPolicy", env, trainfreq=(1, "episode"), gradientsteps=-1, batchsize=100) # SB3 >= 2.3.0: model = TD3("MlpPolicy", env, trainfreq=1, gradientsteps=1, batchsize=256) ```
[!NOTE] Two inconsistencies remain: the default network architecture for
TD3/DDPGis[400, 300]instead of[256, 256]for SAC (for backward compatibility reasons, see report on the influence of the network size ) and the default learning rate is 1e-3 instead of 3e-4 for SAC (for performance reasons, see W&B report on the influence of the lr )
- The default
learning_startsparameter ofDQNhave been changed to be consistent with the other offpolicy algorithms
```python
# SB3 < 2.3.0 default hyperparameters, 50000 corresponded to Atari defaults hyperparameters # model = DQN("MlpPolicy", env, learningstarts=50000) # SB3 >= 2.3.0: model = DQN("MlpPolicy", env, learningstarts=100) ```
- For safety,
torch.load()is now called withweights_only=Truewhen loading torch tensors, policyload()still usesweights_only=Falseas gymnasium imports are required for it to work - When using
huggingface_sb3, you will now need to setTRUST_REMOTE_CODE=Truewhen downloading models from the hub, aspickle.loadis not safe.
New Features:
- Log success rate
rollout/success_ratewhen available for on policy algorithms (@corentinlger)
Bug Fixes:
- Fixed
monitor_wrapperargument that was not passed to the parent class, and dones argument that wasn't passed to_update_into_buffer(@corentinlger)
SB3-Contrib
- Added
rollout_buffer_classandrollout_buffer_kwargsarguments to MaskablePPO - Fixed
train_freqtype annotation for tqc and qrdqn (@Armandpl) - Fixed
sb3_contrib/common/maskable/*.pytype annotations - Fixed
sb3_contrib/ppo_mask/ppo_mask.pytype annotations - Fixed
sb3_contrib/common/vec_env/async_eval.pytype annotations - Add some additional notes about
MaskablePPO(evaluation and multi-process) (@icheered)
RL Zoo
- Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC
- Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)
- Added test dependencies to
setup.py(@power-edge) - Simplify dependencies of
requirements.txt(remove duplicates fromsetup.py)
SBX (SB3 + Jax)
- Added support for
MultiDiscreteandMultiBinaryaction spaces to PPO - Added support for large values for gradient_steps to SAC, TD3, and TQC
- Fix
train()signature and update type hints - Fix replay buffer device at load time
- Added flatten layer
- Added
CrossQ
Others:
- Updated black from v23 to v24
- Updated ruff to >= v0.3.1
- Updated env checker for (multi)discrete spaces with non-zero start.
Documentation:
- Added a paragraph on modifying vectorized environment parameters via setters (@fracapuano)
- Updated callback code example
- Updated export to ONNX documentation, it is now much simpler to export SB3 models with newer ONNX Opset!
- Added video link to "Practical Tips for Reliable Reinforcement Learning" video
- Added
render_mode="human"in the README example (@marekm4) - Fixed docstring signature for sumindependentdims (@stagoverflow)
- Updated docstring description for
log_intervalin the base class (@rushitnshah).
Full Changelog: https://github.com/DLR-RM/stable-baselines3/compare/v2.2.1...v2.3.0
- Python
Published by araffin about 2 years ago
stable-baselines3 - Stable-Baselines3 v2.2.1: Support for options at reset, bug fixes and better error messages
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Note Stable-Baselines3 (SB3) v2.2.0 was yanked after a breaking change was found in GH#1751. Please use SB3 v2.2.1 and not v2.2.0.
Breaking Changes:
- Switched to
rufffor sorting imports (isort is no longer needed), black and ruff version now require a minimum version - Dropped
x is Falsein favor ofnot x, which means that callbacks that wrongly returned None (instead of a boolean) will cause the training to stop (@iwishiwasaneagle)
New Features:
- Improved error message of the
env_checkerfor env wrongly detected as GoalEnv (compute_reward()is defined) - Improved error message when mixing Gym API with VecEnv API (see GH#1694)
- Add support for setting
optionsat reset with VecEnv via theset_options()method. Same as seeds logic, options are reset at the end of an episode (@ReHoss) - Added
rollout_buffer_classandrollout_buffer_kwargsarguments to on-policy algorithms (A2C and PPO)
Bug Fixes:
- Prevents using squashoutput and not usesde in ActorCritcPolicy (@PatrickHelm)
- Performs unscaling of actions in collect_rollout in OnPolicyAlgorithm (@PatrickHelm)
- Moves VectorizedActionNoise into
_setup_learn()in OffPolicyAlgorithm (@PatrickHelm) - Prevents out of bound error on Windows if no seed is passed (@PatrickHelm)
- Calls
callback.update_locals()beforecallback.on_rollout_end()in OnPolicyAlgorithm (@PatrickHelm) - Fixed replay buffer device after loading in OffPolicyAlgorithm (@PatrickHelm)
- Fixed
render_modewhich was not properly loaded when usingVecNormalize.load() - Fixed success reward dtype in
SimpleMultiObsEnv(@NixGD) - Fixed check_env for Sequence observation space (@corentinlger)
- Prevents instantiating BitFlippingEnv with conflicting observation spaces (@kylesayrs)
- Fixed ResourceWarning when loading and saving models (files were not closed), please note that only path are closed automatically, the behavior stay the same for tempfiles (they need to be closed manually), the behavior is now consistent when loading/saving replay buffer
SB3-Contrib
- Added
set_optionsforAsyncEval - Added
rollout_buffer_classandrollout_buffer_kwargsarguments to TRPO
RL Zoo
- Removed
gymdependency, the package is still required for some pretrained agents. - Added
--eval-env-kwargstotrain.py(@Quentin18) - Added
ppo_lstmto hyperparams_opt.py (@technocrat13) - Upgraded to
pybullet_envs_gymnasium>=0.4.0 - Removed old hacks (for instance limiting offpolicy algorithms to one env at test time)
- Updated docker image, removed support for X server
- Replaced deprecated
optuna.suggest_uniform(...)byoptuna.suggest_float(..., low=..., high=...)
SBX (SB3 + Jax)
- Added
DDPGandTD3algorithms
Others:
- Fixed
stable_baselines3/common/callbacks.pytype hints - Fixed
stable_baselines3/common/utils.pytype hints - Fixed
stable_baselines3/common/vec_envs/vec_transpose.pytype hints - Fixed
stable_baselines3/common/vec_env/vec_video_recorder.pytype hints - Fixed
stable_baselines3/common/save_util.pytype hints - Updated docker images to Ubuntu Jammy using micromamba 1.5
- Fixed
stable_baselines3/common/buffers.pytype hints - Fixed
stable_baselines3/her/her_replay_buffer.pytype hints - Buffers do no call an additional
.copy()when storing new transitions - Fixed
ActorCriticPolicy.extract_features()signature by adding an optionalfeatures_extractorargument - Update dependencies (accept newer Shimmy/Sphinx version and remove
sphinx_autodoc_typehints) - Fixed
stable_baselines3/common/off_policy_algorithm.pytype hints - Fixed
stable_baselines3/common/distributions.pytype hints - Fixed
stable_baselines3/common/vec_env/vec_normalize.pytype hints - Fixed
stable_baselines3/common/vec_env/__init__.pytype hints - Switched to PyTorch 2.1.0 in the CI (fixes type annotations)
- Fixed
stable_baselines3/common/policies.pytype hints - Switched to
mypyonly for checking types - Added tests to check consistency when saving/loading files
Documentation:
- Updated RL Tips and Tricks (include recommendation for evaluation, added links to DroQ, ARS and SBX).
- Fixed various typos and grammar mistakes
- Python
Published by araffin over 2 years ago
stable-baselines3 - Stable-Baselines3 v2.1.0: Float64 actions, Gymnasium 0.29 support and bug fixes
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Breaking Changes:
- Removed Python 3.7 support
- SB3 now requires PyTorch >= 1.13
New Features:
- Added Python 3.11 support
- Added Gymnasium 0.29 support (@pseudo-rnd-thoughts)
SB3-Contrib
- Fixed MaskablePPO ignoring
stats_window_sizeargument - Added Python 3.11 support
RL Zoo
- Upgraded to Huggingface-SB3 >= 2.3
- Added Python 3.11 support
Bug Fixes:
- Relaxed check in logger, that was causing issue on Windows with colorama
- Fixed off-policy algorithms with continuous float64 actions (see #1145) (@tobirohrer)
- Fixed
env_checker.pywarning messages for out of bounds in complex observation spaces (@Gabo-Tor)
Others:
- Updated GitHub issue templates
- Fix typo in gym patch error message (@lukashass)
- Refactor
test_spaces.pytests
Documentation:
- Fixed callback example (@BertrandDecoster)
- Fixed policy network example (@kyle-he)
- Added mobile-env as new community project (@stefanbschneider)
- Added DeepNetSlice to community projects (@AlexPasqua)
Full Changelog: https://github.com/DLR-RM/stable-baselines3/compare/v2.0.0...v2.1.0
- Python
Published by araffin almost 3 years ago
stable-baselines3 - Stable-Baselines3 v2.0.0: Gymnasium Support
Warning Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023). We highly recommended you to upgrade to Python >= 3.8.
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Breaking Changes:
- Switched to Gymnasium as primary backend, Gym 0.21 and 0.26 are still supported via the
shimmypackage (@carlosluis, @arjun-kg, @tlpss) - The deprecated
online_samplingargument ofHerReplayBufferwas removed - Removed deprecated
stack_observation_spacemethod ofStackedObservations - Renamed environment output observations in
evaluate_policyto prevent shadowing the input observations during callbacks (@npit) - Upgraded wrappers and custom environment to Gymnasium
- Refined the
HumanOutputFormatfile check: now it verifies if the object is an instance ofio.TextIOBaseinstead of only checking for the presence of awritemethod. - Because of new Gym API (0.26+), the random seed passed to
vec_env.seed(seed=seed)will only be effective after thenenv.reset()call.
New Features:
- Added Gymnasium support (Gym 0.21 and 0.26 are supported via the
shimmypackage)
SB3-Contrib
- Fixed QRDQN update interval for multi envs
RL Zoo
- Gym 0.26+ patches to continue working with pybullet and TimeLimit wrapper
- Renamed
CarRacing-v1toCarRacing-v2in hyperparameters - Huggingface push to hub now accepts a
--n-timestepsargument to adjust the length of the video - Fixed
record_videosteps (before it was stepping in a closed env) - Dropped Gym 0.21 support
Bug Fixes:
- Fixed
VecExtractDictObsdoes not handle terminal observation (@WeberSamuel) - Set NumPy version to
>=1.20due to use ofnumpy.typing(@troiganto) - Fixed loading DQN changes
target_update_interval(@tobirohrer) - Fixed env checker to properly reset the env before calling
step()when checking forInfandNaN(@lutogniew) - Fixed HER
truncate_last_trajectory()(@lbergmann1) - Fixed HER desired and achieved goal order in reward computation (@JonathanKuelz)
Others:
- Fixed
stable_baselines3/a2c/*.pytype hints - Fixed
stable_baselines3/ppo/*.pytype hints - Fixed
stable_baselines3/sac/*.pytype hints - Fixed
stable_baselines3/td3/*.pytype hints - Fixed
stable_baselines3/common/base_class.pytype hints - Fixed
stable_baselines3/common/logger.pytype hints - Fixed
stable_baselines3/common/envs/*.pytype hints - Fixed
stable_baselines3/common/vec_env/vec_monitor|vec_extract_dict_obs|util.pytype hints - Fixed
stable_baselines3/common/vec_env/base_vec_env.pytype hints - Fixed
stable_baselines3/common/vec_env/vec_frame_stack.pytype hints - Fixed
stable_baselines3/common/vec_env/dummy_vec_env.pytype hints - Fixed
stable_baselines3/common/vec_env/subproc_vec_env.pytype hints - Upgraded docker images to use mamba/micromamba and CUDA 11.7
- Updated env checker to reflect what subset of Gymnasium is supported and improve GoalEnv checks
- Improve type annotation of wrappers
- Tests envs are now checked too
- Added render test for
VecEnvandVecEnvWrapper - Update issue templates and env info saved with the model
- Changed
seed()method return type fromListtoSequence - Updated env checker doc and requirements for tuple spaces/goal envs
Documentation:
- Added Deep RL Course link to the Deep RL Resources page
- Added documentation about
VecEnvAPI vs Gym API - Upgraded tutorials to Gymnasium API
- Make it more explicit when using
VecEnvvs Gym env - Added UAVNavigationDRL_AirSim to the project page (@heleidsn)
- Added
EvalCallbackexample (@sidney-tio) - Update custom env documentation
- Added
pink-noise-rlto projects page - Fix custom policy example,
ortho_initwas ignored - Added SBX page
Full Changelog: https://github.com/DLR-RM/stable-baselines3/compare/v1.8.0...v2.0.0
- Python
Published by araffin almost 3 years ago
stable-baselines3 - Stable-Baselines3 v1.8.0: Multi-env HerReplayBuffer, Open RL Benchmark, Improved env checker
Warning Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend. Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs). You can find a migration guide here. If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Breaking Changes:
- Removed shared layers in
mlp_extractor(@AlexPasqua) - Refactored
StackedObservations(it now handles dict obs,StackedDictObservationswas removed) - You must now explicitely pass a
features_extractorparameter when callingextract_features() - Dropped offline sampling for
HerReplayBuffer - As
HerReplayBufferwas refactored to support multiprocessing, previous replay buffer are incompatible with this new version -
HerReplayBufferdoesn't require amax_episode_lengthanymore
New Features:
- Added
repeat_action_probabilityargument inAtariWrapper. - Only use
NoopResetEnvandMaxAndSkipEnvwhen needed inAtariWrapper - Added support for dict/tuple observations spaces for
VecCheckNan, the check is now active in theenv_checker()(@DavyMorgan) - Added multiprocessing support for
HerReplayBuffer -
HerReplayBuffernow supports all datatypes supported byReplayBuffer - Provide more helpful failure messages when validating the
observation_spaceof custom gym environments usingcheck_env(@FieteO) - Added
stats_window_sizeargument to control smoothing in rollout logging (@jonasreiher)
SB3-Contrib
- Added warning about potential crashes caused by
check_envin theMaskablePPOdocs (@AlexPasqua) - Fixed
sb3_contrib/qrdqn/*.pytype hints - Removed shared layers in
mlp_extractor(@AlexPasqua)
RL Zoo
- Open RL Benchmark
- Upgraded to new HerReplayBuffer implementation that supports multiple envs
- Removed TimeFeatureWrapper for Panda and Fetch envs, as the new replay buffer should handle timeout.
- Tuned hyperparameters for RecurrentPPO on Swimmer
- Documentation is now built using Sphinx and hosted on read the doc
- Removed useauthtoken for push to hub util
- Reverted from v3 to v2 for HumanoidStandup, Reacher, InvertedPendulum and InvertedDoublePendulum since they were not part of the mujoco refactoring (see https://github.com/openai/gym/pull/1304)
- Fixed gym-minigrid policy (from MlpPolicy to MultiInputPolicy)
- Replaced deprecated optuna.suggestloguniform(...) by optuna.suggestfloat(..., log=True)
- Switched to ruff and pyproject.toml
- Removed onlinesampling and maxepisode_length argument when using HerReplayBuffer
Bug Fixes:
- Fixed Atari wrapper that missed the reset condition (@luizapozzobon)
- Added the argument
dtype(default tofloat32) to the noise for consistency with gym action (@sidney-tio) - Fixed PPO train/n_updates metric not accounting for early stopping (@adamfrly)
- Fixed loading of normalized image-based environments
- Fixed
DictRolloutBuffer.addwith multidimensional action space (@younik)
Deprecations:
Others:
- Fixed
tests/test_tensorboard.pytype hint - Fixed
tests/test_vec_normalize.pytype hint - Fixed
stable_baselines3/common/monitor.pytype hint - Added tests for StackedObservations
- Removed Gitlab CI file
- Moved from
setup.cgtopyproject.tomlconfiguration file - Switched from
flake8toruff - Upgraded AutoROM to latest version
- Fixed
stable_baselines3/dqn/*.pytype hints - Added
extra_no_romsoption for package installation without Atari Roms
Documentation:
- Renamed
load_parameterstoset_parameters(@DavyMorgan) - Clarified documentation about subproc multiprocessing for A2C (@Bonifatius94)
- Fixed typo in
A2Cdocstring (@AlexPasqua) - Renamed timesteps to episodes for
log_intervaldescription (@theSquaredError) - Removed note about gif creation for Atari games (@harveybellini)
- Added information about default network architecture
- Update information about Gymnasium support
- Python
Published by araffin about 3 years ago
stable-baselines3 - Stable-Baselines3 v1.7.0 : non-shared features extractor, bug fixes and quality of life improvements
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Warning Shared layers in MLP policy (
mlp_extractor) are now deprecated for PPO, A2C and TRPO. This feature will be removed in SB3 v1.8.0 and the behavior ofnet_arch=[64, 64]will create separate networks with the same architecture, to be consistent with the off-policy algorithms.Note A2C and PPO models saved with SB3 < 1.7.0 will show a warning about missing keys in the state dict when loaded with SB3 >= 1.7.0. To suppress the warning, simply save the model again. You can find more info in issue #1233
Breaking Changes:
- Removed deprecated
create_eval_env,eval_env,eval_log_path,n_eval_episodesandeval_freqparameters, please use anEvalCallbackinstead - Removed deprecated
sde_net_archparameter - Removed
retattributes inVecNormalize, please usereturnsinstead VecNormalizenow updates the observation space when normalizing images
New Features:
- Introduced mypy type checking
- Added option to have non-shared features extractor between actor and critic in on-policy algorithms (@AlexPasqua)
- Added
with_biasargument tocreate_mlp - Added support for multidimensional
spaces.MultiBinaryobservations - Features extractors now properly support unnormalized image-like observations (3D tensor)
when passing
normalize_images=False - Added
normalized_imageparameter toNatureCNNandCombinedExtractor - Added support for Python 3.10
SB3-Contrib
- Fixed a bug in
RecurrentPPOwhere the lstm states where incorrectly reshaped forn_lstm_layers > 1(thanks @kolbytn) - Fixed
RuntimeError: rnn: hx is not contiguouswhile predicting terminal values forRecurrentPPOwhenn_lstm_layers > 1
RL Zoo
- Added support for python file for configuration
- Added
monitor_kwargsparameter
Bug Fixes:
- Fixed
ProgressBarCallbackunder-reporting (@dominicgkerr) - Fixed return type of
evaluate_actionsinActorCritcPolicyto reflect that entropy is an optional tensor (@Rocamonde) - Fixed type annotation of
policyinBaseAlgorithmandOffPolicyAlgorithm - Allowed model trained with Python 3.7 to be loaded with Python 3.8+ without the
custom_objectsworkaround - Raise an error when the same gym environment instance is passed as separate environments when creating a vectorized environment with more than one environment. (@Rocamonde)
- Fix type annotation of
modelinevaluate_policy - Fixed
Selfreturn type usingTypeVar - Fixed the env checker, the key was not passed when checking images from Dict observation space
- Fixed
normalize_imageswhich was not passed to parent class in some cases - Fixed
load_from_vectorthat was broken with newer PyTorch version when passing PyTorch tensor
Deprecations:
- You should now explicitely pass a
features_extractorparameter when callingextract_features() - Deprecated shared layers in
MlpExtractor(@AlexPasqua)
Others:
- Used issue forms instead of issue templates
- Updated the PR template to associate each PR with its peer in RL-Zoo3 and SB3-Contrib
- Fixed flake8 config to be compatible with flake8 6+
- Goal-conditioned environments are now characterized by the availability of the
compute_rewardmethod, rather than by their inheritance togym.GoalEnv - Replaced
CartPole-v0byCartPole-v1is tests - Fixed
tests/test_distributions.pytype hints - Fixed
stable_baselines3/common/type_aliases.pytype hints - Fixed
stable_baselines3/common/torch_layers.pytype hints - Fixed
stable_baselines3/common/env_util.pytype hints - Fixed
stable_baselines3/common/preprocessing.pytype hints - Fixed
stable_baselines3/common/atari_wrappers.pytype hints - Fixed
stable_baselines3/common/vec_env/vec_check_nan.pytype hints - Exposed modules in
__init__.pywith the__all__attribute (@ZikangXiong) - Upgraded GitHub CI/setup-python to v4 and checkout to v3
- Set tensors construction directly on the device (~8% speed boost on GPU)
- Monkey-patched
np.bool = boolso gym 0.21 is compatible with NumPy 1.24+ - Standardized the use of
from gym import spaces - Modified
get_system_infoto avoid issue linked to copy-pasting on GitHub issue
Documentation:
- Updated Hugging Face Integration page (@simoninithomas)
- Changed
envtovec_envwhen environment is vectorized - Updated custom policy docs to better explain the
mlp_extractor's dimensions (@AlexPasqua) - Updated custom policy documentation (@athatheo)
- Improved tensorboard callback doc
- Clarify doc when using image-like input
- Added RLeXplore to the project page (@yuanmingqi)
- Python
Published by araffin over 3 years ago
stable-baselines3 - SB3 v1.6.2: Progress bar and RL Zoo3 package
SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3: https://github.com/DLR-RM/rl-baselines3-zoo
New Features:
- Added
progress_barargument in thelearn()method, displayed using TQDM and rich packages - Added progress bar callback
RL Zoo3
- The RL Zoo can now be installed as a package (
pip install rl_zoo3)
Bug Fixes:
self.num_timestepswas initialized properly only after the first call toon_step()for callbacks- Set importlib-metadata version to
~=4.13to be compatible withgym=0.21
Deprecations:
- Added deprecation warning if parameters
eval_env,eval_freqorcreate_eval_envare used (see #925) (@tobirohrer)
Others:
- Fixed type hint of the
env_idparameter inmake_vec_envandmake_atari_env(@AlexPasqua)
Documentation:
- Extended docstring of the
wrapper_classparameter inmake_vec_env(@AlexPasqua)
- Python
Published by araffin over 3 years ago
stable-baselines3 - SB3 v1.6.1: Bug fix release
SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
Breaking Changes:
- Switched minimum tensorboard version to 2.9.1
New Features:
- Support logging hyperparameters to tensorboard (@timothe-chaumont)
- Added checkpoints for replay buffer and
VecNormalizestatistics (@anand-bala) - Added option for
Monitorto append to existing file instead of overriding (@sidney-tio) - The env checker now raises an error when using dict observation spaces and observation keys don't match observation space keys
SB3-Contrib
- Fixed the issue of wrongly passing policy arguments when using
CnnLstmPolicyorMultiInputLstmPolicywithRecurrentPPO(@mlodel)
Bug Fixes:
- Fixed issue where
PPOgives NaN if rollout buffer provides a batch of size 1 (@hughperkins) - Fixed the issue that
predictdoes not always return action asnp.ndarray(@qgallouedec) - Fixed division by zero error when computing FPS when a small number of time has elapsed in operating systems with low-precision timers.
- Added multidimensional action space support (@qgallouedec)
- Fixed missing verbose parameter passing in the
EvalCallbackconstructor (@burakdmb) - Fixed the issue that when updating the target network in DQN, SAC, TD3, the
running_meanandrunning_varproperties of batch norm layers are not updated (@honglu2875) - Fixed incorrect type annotation of the replaybufferclass argument in
common.OffPolicyAlgorithminitializer, where an instance instead of a class was required (@Rocamonde) - Fixed loading saved model with different number of envrionments
- Removed
forward()abstract method declaration fromcommon.policies.BaseModel(already defined intorch.nn.Module) to fix type errors in subclasses (@Rocamonde) - Fixed the return type of
.load()and.learn()methods inBaseAlgorithmso that they now useTypeVar(@Rocamonde) - Fixed an issue where keys with different tags but the same key raised an error in
common.logger.HumanOutputFormat(@Rocamonde and @AdamGleave)
Others:
- Fixed
DictReplayBuffer.next_observationstyping (@qgallouedec) - Added support for
device="auto"in buffers and made it default (@qgallouedec) - Updated
ResultsWriter` (used internally byMonitorwrapper) to automatically create missing directories whenfilename`` is a path (@dominicgkerr)
Documentation:
- Added an example of callback that logs hyperparameters to tensorboard. (@timothe-chaumont)
- Fixed typo in docstring "nature" -> "Nature" (@Melanol)
- Added info on split tensorboard logs into (@Melanol)
- Fixed typo in ppo doc (@francescoluciano)
- Fixed typo in install doc(@jlp-ue)
- Clarified and standardized verbosity documentation
- Added link to a GitHub issue in the custom policy documentation (@AlexPasqua)
- Fixed typos (@Akhilez)
- Python
Published by araffin over 3 years ago
stable-baselines3 - SB3 v1.6.0: Recurrent PPO (PPO LSTM), better defaults for learning from pixels with SAC/TD3
SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
Breaking Changes:
- Changed the way policy "aliases" are handled ("MlpPolicy", "CnnPolicy", ...), removing the former
register_policyhelper,policy_baseparameter and usingpolicy_aliasesstatic attributes instead (@Gregwar) - SB3 now requires PyTorch >= 1.11
- Changed the default network architecture when using
CnnPolicyorMultiInputPolicywith SAC or DDPG/TD3,share_features_extractoris now set to False by default and thenet_arch=[256, 256](instead ofnet_arch=[]that was before)
SB3-Contrib
- Added Recurrent PPO (PPO LSTM). See https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/53
Bug Fixes:
- Fixed saving and loading large policies greater than 2GB (@jkterry1, @ycheng517)
- Fixed final goal selection strategy that did not sample the final achieved goal (@qgallouedec)
- Fixed a bug with special characters in the tensorboard log name (@quantitative-technologies)
- Fixed a bug in
DummyVecEnv's andSubprocVecEnv's seeding function. None value was unchecked (@ScheiklP) - Fixed a bug where
EvalCallbackwould crash when trying to synchronizeVecNormalizestats when observation normalization was disabled - Added a check for unbounded actions
- Fixed issues due to newer version of protobuf (tensorboard) and sphinx
- Fix exception causes all over the codebase (@cool-RR)
- Prohibit simultaneous use of optimizememoryusage and handletimeouttermination due to a bug (@MWeltevrede)
- Fixed a bug in
kl_divergencecheck that would fail when using numpy arrays with MultiCategorical distribution
Others:
- Upgraded to Python 3.7+ syntax using
pyupgrade - Removed redundant double-check for nested observations from
BaseAlgorithm._wrap_env(@TibiGG)
Documentation:
- Added link to gym doc and gym env checker
- Fix typo in PPO doc (@bcollazo)
- Added link to PPO ICLR blog post
- Added remark about breaking Markov assumption and timeout handling
- Added doc about MLFlow integration via custom logger (@git-thor)
- Updated Huggingface integration doc
- Added copy button for code snippets
- Added doc about EnvPool and Isaac Gym support
- Python
Published by araffin almost 4 years ago
stable-baselines3 - SB3 v1.5.0: Bug fixes, early stopping callback
SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
Breaking Changes:
- Switched minimum Gym version to 0.21.0.
New Features:
- Added
StopTrainingOnNoModelImprovementto callback collection (@caburu) - Makes the length of keys and values in
HumanOutputFormatconfigurable, depending on desired maximum width of output. - Allow PPO to turn of advantage normalization (see PR #763) @vwxyzjn
SB3-Contrib
- coming soon: Cross Entropy Method, see https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/62
Bug Fixes:
- Fixed a bug in
VecMonitor. The monitor did not consider theinfo_keywordsduring stepping (@ScheiklP) - Fixed a bug in
HumanOutputFormat. Distinct keys truncated to the same prefix would overwrite each others value, resulting in only one being output. This now raises an error (this should only affect a small fraction of use cases with very long keys.) - Routing all the
nn.Modulecalls through implicit rather than explict forward as per pytorch guidelines (@manuel-delverme) - Fixed a bug in
VecNormalizewhere error occurs whennorm_obsis set to False for environment with dictionary observation (@buoyancy99) - Set default
envargument toNoneinHerReplayBuffer.sample(@qgallouedec) - Fix
batch_sizetyping inDQN(@qgallouedec) - Fixed sample normalization in
DictReplayBuffer(@qgallouedec)
Others:
- Fixed pytest warnings
- Removed parameter
remove_time_limit_terminationin off policy algorithms since it was dead code (@Gregwar)
Documentation:
- Added doc on Hugging Face integration (@simoninithomas)
- Added furuta pendulum project to project list (@armandpl)
- Fix indentation 2 spaces to 4 spaces in custom env documentation example (@Gautam-J)
- Update MlpExtractor docstring (@gianlucadecola)
- Added explanation of the logger output
- Update
Directly Accessing The Summary Writerin tensorboard integration (@xy9485)
Full Changelog: https://github.com/DLR-RM/stable-baselines3/compare/v1.4.0...v1.5.0
- Python
Published by araffin about 4 years ago
stable-baselines3 - SB3 v1.4.0: TRPO, ARS and multi env training for off-policy algorithms
SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
Breaking Changes:
- Dropped python 3.6 support (as announced in previous release)
- Renamed
maskargument of thepredict()method toepisode_start(used with RNN policies only) - local variables
action,doneandrewardwere renamed to their plural form for offpolicy algorithms (actions,dones,rewards), this may affect custom callbacks. - Removed
episode_rewardfield fromRolloutReturn()type
Warning:
An update to the HER algorithm is planned to support multi-env training and remove the max episode length constrain.
(see PR #704)
This will be a backward incompatible change (model trained with previous version of HER won't work with the new version).
New Features:
- Added
norm_obs_keysparam forVecNormalizewrapper to configure which observation keys to normalize (@kachayev) - Added experimental support to train off-policy algorithms with multiple envs (note:
HerReplayBuffercurrently not supported) - Handle timeout termination properly for on-policy algorithms (when using
TimeLimit) - Added
skipoption toVecTransposeImageto skip transforming the channel order when the heuristic is wrong - Added
copy()andcombine()methods toRunningMeanStd
SB3-Contrib
- Added Trust Region Policy Optimization (TRPO) (@cyprienc)
- Added Augmented Random Search (ARS) (@sgillen)
- Coming soon: PPO LSTM, see https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/53
Bug Fixes:
- Fixed a bug where
set_env()withVecNormalizewould result in an error with off-policy algorithms (thanks @cleversonahum) - FPS calculation is now performed based on number of steps performed during last
learncall, even whenreset_num_timestepsis set toFalse(@kachayev) - Fixed evaluation script for recurrent policies (experimental feature in SB3 contrib)
- Fixed a bug where the observation would be incorrectly detected as non-vectorized instead of throwing an error
- The env checker now properly checks and warns about potential issues for continuous action spaces when the boundaries are too small or when the dtype is not float32
- Fixed a bug in
VecFrameStackwith channel first image envs, where the terminal observation would be wrongly created.
Others:
- Added a warning in the env checker when not using
np.float32for continuous actions - Improved test coverage and error message when checking shape of observation
- Added
newline="\n"when opening CSV monitor files so that each line ends with\r\ninstead of\r\r\non Windows while Linux environments are not affected (@hsuehch) - Fixed
deviceargument inconsistency (@qgallouedec)
Documentation:
- Add drivergym to projects page (@theDebugger811)
- Add highway-env to projects page (@eleurent)
- Add tactile-gym to projects page (@ac-93)
- Fix indentation in the RL tips page (@cove9988)
- Update GAE computation docstring
- Add documentation on exporting to TFLite/Coral
- Added JMLR paper and updated citation
- Added link to RL Tips and Tricks video
- Updated
BaseAlgorithm.loaddocstring (@Demetrio92) - Added a note on
loadbehavior in the examples (@Demetrio92) - Updated SB3 Contrib doc
- Fixed A2C and migration guide guidance on how to set epsilon with RMSpropTFLike (@thomasgubler)
- Fixed custom policy documentation (@IperGiove)
- Added doc on Weights & Biases integration
- Python
Published by araffin over 4 years ago
stable-baselines3 - SB3 v1.3.0 : Bug fixes and improvements for the user
WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021). We highly recommend you to upgrade to Python >= 3.7.
SB3-Contrib changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/releases/tag/v1.3.0
Breaking Changes:
sde_net_archargument in policies is deprecated and will be removed in a future version._get_latent(ActorCriticPolicy) was removedAll logging keys now use underscores instead of spaces (@timokau). Concretely this changes:
time/total timestepstotime/total_timestepsfor off-policy algorithms (PPO and A2C) and the eval callback (on-policy algorithms already used the underscored version),rollout/exploration ratetorollout/exploration_rateandrollout/success ratetorollout/success_rate.
New Features:
- Added methods
get_distributionandpredict_valuesforActorCriticPolicyfor A2C/PPO/TRPO (@cyprienc) - Added methods
forward_actorandforward_criticforMlpExtractor - Added
sb3.get_system_info()helper function to gather version information relevant to SB3 (e.g., Python and PyTorch version) - Saved models now store system information where agent was trained, and load functions have
print_system_infoparameter to help debugging load issues.
Bug Fixes:
- Fixed
dtypeof observations forSimpleMultiObsEnv - Allow
VecNormalizeto wrap discrete-observation environments to normalize reward when observation normalization is disabled. - Fixed a bug where
DQNwould throw an error when usingDiscreteobservation and stochastic actions - Fixed a bug where sub-classed observation spaces could not be used
- Added
force_resetargument toload()andset_env()in order to be able to calllearn(reset_num_timesteps=False)with a new environment
Others:
- Cap gym max version to 0.19 to avoid issues with atari-py and other breaking changes
- Improved error message when using dict observation with the wrong policy
- Improved error message when using
EvalCallbackwith two envs not wrapped the same way. - Added additional infos about supported python version for PyPi in
setup.py
Documentation:
- Add Rocket League Gym to list of supported projects (@AechPro)
- Added gym-electric-motor to project page (@wkirgsn)
- Added policy-distillation-baselines to project page (@CUN-bjy)
- Added ONNX export instructions (@batu)
- Update read the doc env (fixed
docutilsissue) - Fix PPO environment name (@IljaAvadiev)
- Fix custom env doc and add env registration example
- Update algorithms from SB3 Contrib
- Use underscores for numeric literals in examples to improve clarity
- Python
Published by araffin over 4 years ago
stable-baselines3 - SB3 v1.2.0: Hotfix for VecNormalize, training/eval mode support
Breaking Changes:
- SB3 now requires PyTorch >= 1.8.1
VecNormalizeretattribute was renamed toreturns
Bug Fixes:
- Hotfix for
VecNormalizewhere the observation filter was not updated at reset (thanks @vwxyzjn) - Fixed model predictions when using batch normalization and dropout layers by calling
train()andeval()(@davidblom603) - Fixed model training for DQN, TD3 and SAC so that their target nets always remain in evaluation mode (@ayeright)
- Passing
gradient_steps=0to an off-policy algorithm will result in no gradient steps being taken (vs as many gradient steps as steps done in the environment during the rollout in previous versions)
Others:
- Enabled Python 3.9 in GitHub CI
- Fixed type annotations
- Refactored
predict()by moving the preprocessing toobs_to_tensor()method
Documentation:
- Updated multiprocessing example
- Added example of
VecEnvWrapper - Added a note about logging to tensorboard more often
- Added warning about simplicity of examples and link to RL zoo (@MihaiAnca13)
- Python
Published by araffin over 4 years ago
stable-baselines3 - SB3 v1.1.0: Dictionary observation support, timeout handling and refactored HER buffer
Breaking Changes
- All customs environments (e.g. the
BitFlippingEnvorIdentityEnv) were moved tostable_baselines3.common.envsfolder - Refactored
HERwhich is now theHerReplayBufferclass that can be passed to any off-policy algorithm - Handle timeout termination properly for off-policy algorithms (when using
TimeLimit) - Renamed
_last_donesanddonesto_last_episode_startsandepisode_startsinRolloutBuffer. - Removed
ObsDictWrapperasDictobservation spaces are now supported
python
her_kwargs = dict(n_sampled_goal=2, goal_selection_strategy="future", online_sampling=True)
# SB3 < 1.1.0
# model = HER("MlpPolicy", env, model_class=SAC, **her_kwargs)
# SB3 >= 1.1.0:
model = SAC("MultiInputPolicy", env, replay_buffer_class=HerReplayBuffer, replay_buffer_kwargs=her_kwargs)
- Updated the KL Divergence estimator in the PPO algorithm to be positive definite and have lower variance (@09tangriro)
- Updated the KL Divergence check in the PPO algorithm to be before the gradient update step rather than after end of epoch (@09tangriro)
- Removed parameter
channels_lastfromis_image_spaceas it can be inferred. - The logger object is now an attribute
model.loggerthat be set by the user usingmodel.set_logger() - Changed the signature of
logger.configureandutils.configure_logger, they now return aLoggerobject - Removed
Logger.CURRENTandLogger.DEFAULT - Moved
warn(), debug(), log(), info(), dump()methods to theLoggerclass .learn()now throws an import error when the user tries to log to tensorboard but the package is not installed
New Features
- Added support for single-level
Dictobservation space (@JadenTravnik) - Added
DictRolloutBufferDictReplayBufferto support dictionary observations (@JadenTravnik) - Added
StackedObservationsandStackedDictObservationsthat are used withinVecFrameStack - Added simple 4x4 room Dict test environments
HerReplayBuffernow supportsVecNormalizewhenonline_sampling=False- Added VecMonitor and VecExtractDictObs wrappers to handle gym3-style vectorized environments (@vwxyzjn)
- Ignored the terminal observation if the it is not provided by the environment such as the gym3-style vectorized environments. (@vwxyzjn)
- Added policy_base as input to the OnPolicyAlgorithm for more flexibility (@09tangriro)
- Added support for image observation when using
HER - Added
replay_buffer_classandreplay_buffer_kwargsarguments to off-policy algorithms - Added
kl_divergencehelper forDistributionclasses (@09tangriro) - Added support for vector environments with
num_envs > 1(@benblack769) - Added
wrapper_kwargsargument tomake_vec_env(@amy12xx)
Bug Fixes
- Fixed potential issue when calling off-policy algorithms with default arguments multiple times (the size of the replay buffer would be the same)
- Fixed loading of
ent_coefforSACandTQC, it was not optimized anymore (thanks @Atlis) - Fixed saving of
A2CandPPOpolicy when using gSDE (thanks @liusida) - Fixed a bug where no output would be shown even if
verbose>=1after passingverbose=0once - Fixed observation buffers dtype in DictReplayBuffer (@c-rizz)
- Fixed EvalCallback tensorboard logs being logged with the incorrect timestep. They are now written with the timestep at which they were recorded. (@skandermoalla)
Others
- Added
flake8-bugbearto tests dependencies to find likely bugs - Updated
env_checkerto reflect support of dict observation spaces - Added Code of Conduct
- Added tests for GAE and lambda return computation
- Updated distribution entropy test (thanks @09tangriro)
- Added sanity check
batch_size > 1in PPO to avoid NaN in advantage normalization
Documentation:
- Added gym pybullet drones project (@JacopoPan)
- Added link to SuperSuit in projects (@justinkterry)
- Fixed DQN example (thanks @ltbd78)
- Clarified channel-first/channel-last recommendation
- Update sphinx environment installation instructions (@tom-doerr)
- Clarified pip installation in Zsh (@tom-doerr)
- Clarified return computation for on-policy algorithms (TD(lambda) estimate was used)
- Added example for using
ProcgenEnv - Added note about advanced custom policy example for off-policy algorithms
- Fixed DQN unicode checkmarks
- Updated migration guide (@juancroldan)
- Pinned
docutils==0.16to avoid issue with rtd theme - Clarified callback
save_freqdefinition - Added doc on how to pass a custom logger
- Remove recurrent policies from
A2Cdocs (@bstee615)
- Python
Published by araffin almost 5 years ago
stable-baselines3 - Stable-Baselines3 v1.0
First Major Version
Blog post: https://araffin.github.io/post/sb3/
100+ pre-trained models in the zoo: https://github.com/DLR-RM/rl-baselines3-zoo
Breaking Changes:
- Removed
stable_baselines3.common.cmd_util(already deprecated), please useenv_utilinstead
A refactoring of the HER algorithm is planned together with support for dictionary observations (see PR #243 and
#351)
This will be a backward incompatible change (model trained with previous version of HER won't work with the new version).
New Features:
- Added support for
custom_objectswhen loading models
Bug Fixes:
- Fixed a bug with
DQNpredict method when usingdeterministic=Falsewith image space
Documentation:
- Fixed examples
- Added new project using SB3: rl_reach (@PierreExeter)
- Added note about slow-down when switching to PyTorch
- Add a note on continual learning and resetting environment
- Updated RL-Zoo to reflect the fact that is it more than a collection of trained agents
- Added images to illustrate the training loop and custom policies (created with https://excalidraw.com/)
- Updated the custom policy section
- Python
Published by araffin about 5 years ago
stable-baselines3 - v1.0rc1
Second release candidate
- Python
Published by araffin about 5 years ago
stable-baselines3 - Bug fixes, better image support and last release before v1.0
Breaking Changes:
evaluate_policynow returns rewards/episode lengths from aMonitorwrapper if one is present, this allows to return the unnormalized reward in the case of Atari games for instance.- Renamed
common.vec_env.is_wrappedtocommon.vec_env.is_vecenv_wrappedto avoid confusion with the newis_wrapped()helper - Renamed
_get_data()to_get_constructor_parameters()for policies (this affects independent saving/loading of policies) - Removed
n_episodes_rolloutand merged it withtrain_freq, which now accepts a tuple(frequency, unit): replay_bufferincollect_rolloutis no more optional
```python
# SB3 < 0.11.0 # model = SAC("MlpPolicy", env, nepisodesrollout=1, trainfreq=-1) # SB3 >= 0.11.0: model = SAC("MlpPolicy", env, trainfreq=(1, "episode")) ```
New Features:
- Add support for
VecFrameStackto stack on first or last observation dimension, along with automatic check for image spaces. VecFrameStacknow has achannels_orderargument to tell if observations should be stacked on the first or last observation dimension (originally always stacked on last).- Added
common.env_util.is_wrappedandcommon.env_util.unwrap_wrapperfunctions for checking/unwrapping an environment for specific wrapper. - Added
env_is_wrapped()method forVecEnvto check if its environments are wrapped with given Gym wrappers. - Added
monitor_kwargsparameter tomake_vec_envandmake_atari_env - Wrap the environments automatically with a
Monitorwrapper when possible. EvalCallbacknow logs the success rate when available (is_successmust be present in the info dict)- Added new wrappers to log images and matplotlib figures to tensorboard. (@zampanteymedio)
- Add support for text records to
Logger. (@lorenz-h)
Bug Fixes:
- Fixed bug where code added VecTranspose on channel-first image environments (thanks @qxcv)
- Fixed
DQNpredict method when using singlegym.Envwithdeterministic=False - Fixed bug that the arguments order of
explained_variance()inppo.pyanda2c.pyis not correct (@thisray) - Fixed bug where full
HerReplayBufferleads to an index error. (@megan-klaiber) - Fixed bug where replay buffer could not be saved if it was too big (> 4 Gb) for python<3.8 (thanks @hn2)
- Added informative
PPOconstruction error in edge-case scenario wheren_steps * n_envs = 1(size of rollout buffer), which otherwise causes downstream breaking errors in training (@decodyng) - Fixed discrete observation space support when using multiple envs with A2C/PPO (thanks @ardabbour)
- Fixed a bug for TD3 delayed update (the update was off-by-one and not delayed when
train_freq=1) - Fixed numpy warning (replaced
np.boolwithbool) - Fixed a bug where
VecNormalizewas not normalizing the terminal observation - Fixed a bug where
VecTransposewas not transposing the terminal observation - Fixed a bug where the terminal observation stored in the replay buffer was not the right one for off-policy algorithms
- Fixed a bug where
action_noisewas not used when usingHER(thanks @ShangqunYu) - Fixed a bug where
train_freqwas not properly converted when loading a saved model
Others:
- Add more issue templates
- Add signatures to callable type annotations (@ernestum)
- Improve error message in
NatureCNN - Added checks for supported action spaces to improve clarity of error messages for the user
- Renamed variables in the
train()method ofSAC,TD3andDQNto match SB3-Contrib. - Updated docker base image to Ubuntu 18.04
- Set tensorboard min version to 2.2.0 (earlier version are apparently not working with PyTorch)
- Added warning for
PPOwhenn_steps * n_envsis not a multiple ofbatch_size(last mini-batch truncated) (@decodyng) - Removed some warnings in the tests
Documentation:
- Updated algorithm table
- Minor docstring improvements regarding rollout (@stheid)
- Fix migration doc for
A2C(epsilon parameter) - Fix
clip_rangedocstring - Fix duplicated parameter in
EvalCallbackdocstring (thanks @tfederico) - Added example of learning rate schedule
- Added SUMO-RL as example project (@LucasAlegre)
- Fix docstring of classes in atari_wrappers.py which were inside the constructor (@LucasAlegre)
- Added SB3-Contrib page
- Fix bug in the example code of DQN (@AptX395)
- Add example on how to access the tensorboard summary writer directly. (@lorenz-h)
- Updated migration guide
- Updated custom policy doc (separate policy architecture recommended)
- Added a note about OpenCV headless version
- Corrected typo on documentation (@mschweizer)
- Provide the environment when loading the model in the examples (@lorepieri8)
- Python
Published by araffin over 5 years ago
stable-baselines3 - HER with online and offline sampling, bug fixes for features extraction
Breaking Changes
- Warning: Renamed
common.cmd_utiltocommon.env_utilfor clarity (affectsmake_vec_envandmake_atari_envfunctions)
New Features
- Allow custom actor/critic network architectures using
net_arch=dict(qf=[400, 300], pi=[64, 64])for off-policy algorithms (SAC, TD3, DDPG) - Added Hindsight Experience Replay
HER. (@megan-klaiber) VecNormalizenow supportsgym.spaces.Dictobservation spaces- Support logging videos to Tensorboard (@SwamyDev)
- Added
share_features_extractorargument toSACandTD3policies
Bug Fixes
- Fix GAE computation for on-policy algorithms (off-by one for the last value) (thanks @Wovchena)
- Fixed potential issue when loading a different environment
- Fix ignoring the exclude parameter when recording logs using json, csv or log as logging format (@SwamyDev)
- Make
make_vec_envsupport theenv_kwargsargument when using an env ID str (@ManifoldFR) - Fix model creation initializing CUDA even when
device="cpu"is provided - Fix
check_envnot checking if the env has a Dict actionspace before calling_check_nan(@wmmc88) - Update the check for spaces unsupported by Stable Baselines 3 to include checks on the action space (@wmmc88)
- Fixed feature extractor bug for target network where the same net was shared instead
of being separate. This bug affects
SAC,DDPGandTD3when usingCnnPolicy(or custom feature extractor) - Fixed a bug when passing an environment when loading a saved model with a
CnnPolicy, the passed env was not wrapped properly (the bug was introduced when implementingHERso it should not be present in previous versions)
Others
- Improved typing coverage
- Improved error messages for unsupported spaces
- Added
.vscodeto the gitignore
Documentation
- Added first draft of migration guide
- Added intro to imitation library (@shwang)
- Enabled doc for
CnnPolicies - Added advanced saving and loading example
- Added base doc for exporting models
- Added example for getting and setting model parameters
- Python
Published by araffin over 5 years ago
stable-baselines3 - Bug fixes, get/set parameters and improved docs
Breaking Changes:
- Removed
devicekeyword argument of policies; usepolicy.to(device)instead. (@qxcv) - Rename
BaseClass.get_torch_variables->BaseClass._get_torch_save_paramsandBaseClass.excluded_save_params->BaseClass._excluded_save_params - Renamed saved items
tensorstopytorch_variablesfor clarity make_atari_env,make_vec_envandset_random_seedmust be imported with (and not directly fromstable_baselines3.common):
python
from stable_baselines3.common.cmd_util import make_atari_env, make_vec_env
from stable_baselines3.common.utils import set_random_seed
New Features:
- Added
unwrap_vec_wrapper()tocommon.vec_envto extractVecEnvWrapperif needed - Added
StopTrainingOnMaxEpisodesto callback collection (@xicocaio) - Added
devicekeyword argument toBaseAlgorithm.load()(@liorcohen5) - Callbacks have access to rollout collection locals as in SB2. (@PartiallyTyped)
- Added
get_parametersandset_parametersfor accessing/setting parameters of the agent - Added actor/critic loss logging for TD3. (@mloo3)
Bug Fixes:
- Fixed a bug where the environment was reset twice when using
evaluate_policy - Fix logging of
clip_fractionin PPO (@diditforlulz273) - Fixed a bug where cuda support was wrongly checked when passing the GPU index, e.g.,
device="cuda:0"(@liorcohen5) - Fixed a bug when the random seed was not properly set on cuda when passing the GPU index
Others:
- Improve typing coverage of the
VecEnv - Fix type annotation of
make_vec_env(@ManifoldFR) - Removed
AlreadySteppingErrorandNotSteppingErrorthat were not used - Fixed typos in SAC and TD3
- Reorganized functions for clarity in
BaseClass(save/load functions close to each other, private functions at top) - Clarified docstrings on what is saved and loaded to/from files
- Simplified
save_to_zip_filefunction by removing duplicate code - Store library version along with the saved models
- DQN loss is now logged
Documentation:
- Added
StopTrainingOnMaxEpisodesdetails and example (@xicocaio) - Updated custom policy section (added custom feature extractor example)
- Re-enable
sphinx_autodoc_typehints - Updated doc style for type hints and remove duplicated type hints
- Python
Published by araffin over 5 years ago
stable-baselines3 - Added DQN and DDPG, bug fixes and performance matching for Atari games
Breaking Changes:
AtariWrapperand other Atari wrappers were updated to match SB2 onessave_replay_buffernow receives as argument the file path instead of the folder path (@tirafesi)- Refactored
Criticclass forTD3andSAC, it is now calledContinuousCriticand has an additional parametern_critics SACandTD3now accept an arbitrary number of critics (e.g.policy_kwargs=dict(n_critics=3)) instead of only 2 previously
New Features:
- Added
DQNAlgorithm (@Artemis-Skade) - Buffer dtype is now set according to action and observation spaces for
ReplayBuffer - Added warning when allocation of a buffer may exceed the available memory of the system
when
psutilis available - Saving models now automatically creates the necessary folders and raises appropriate warnings (@PartiallyTyped)
- Refactored opening paths for saving and loading to use strings, pathlib or io.BufferedIOBase (@PartiallyTyped)
- Added
DDPGalgorithm as a special case ofTD3. - Introduced
BaseModelabstract parent forBasePolicy, which critics inherit from.
Bug Fixes:
- Fixed a bug in the
close()method ofSubprocVecEnv, causing wrappers further down in the wrapper stack to not be closed. (@NeoExtended) - Fix target for updating q values in SAC: the entropy term was not conditioned by terminals states
- Use
cloudpickle.loadinstead ofpickle.loadinCloudpickleWrapper. (@shwang) - Fixed a bug with orthogonal initialization when
bias=Falsein custom policy (@rk37) - Fixed approximate entropy calculation in PPO and A2C. (@andyshih12)
- Fixed DQN target network sharing feature extractor with the main network.
- Fixed storing correct
donesin on-policy algorithm rollout collection. (@andyshih12) - Fixed number of filters in final convolutional layer in NatureCNN to match original implementation.
Others:
- Refactored off-policy algorithm to share the same
.learn()method - Split the
collect_rollout()method for off-policy algorithms - Added
_on_step()for off-policy base class - Optimized replay buffer size by removing the need of
next_observationsnumpy array - Optimized polyak updates (1.5-1.95 speedup) through inplace operations (@PartiallyTyped)
- Switch to
blackcodestyle and addedmake format,make check-codestyleandcommit-checks - Ignored errors from newer pytype version
- Added a check when using
gSDE - Removed codacy dependency from Dockerfile
- Added
common.sb2_compat.RMSpropTFLikeoptimizer, which corresponds closer to the implementation of RMSprop from Tensorflow.
Documentation:
- Updated notebook links
- Fixed a typo in the section of Enjoy a Trained Agent, in RL Baselines3 Zoo README. (@blurLake)
- Added Unity reacher to the projects page (@koulakis)
- Added PyBullet colab notebook
- Fixed typo in PPO example code (@joeljosephjin)
- Fixed typo in custom policy doc (@RaphaelWag)
- Python
Published by araffin almost 6 years ago
stable-baselines3 - Hotfix for PPO/A2C + gSDE, internal refactoring and bug fixes
Breaking Changes:
render()method ofVecEnvsnow only accept one argument:modeCreated new file common/torch_layers.py, similar to SB refactoring
- Contains all PyTorch network layer definitions and feature extractors:
MlpExtractor,create_mlp,NatureCNN
- Contains all PyTorch network layer definitions and feature extractors:
Renamed
BaseRLModeltoBaseAlgorithm(along with offpolicy and onpolicy variants)Moved on-policy and off-policy base algorithms to
common/on_policy_algorithm.pyandcommon/off_policy_algorithm.py, respectively.Moved
PPOPolicytoActorCriticPolicyin common/policies.pyMoved
PPO(algorithm class) intoOnPolicyAlgorithm(common/on_policy_algorithm.py), to be shared with A2CMoved following functions from
BaseAlgorithm:_load_from_filetoload_from_zip_file(save_util.py)_save_to_file_ziptosave_to_zip_file(save_util.py)safe_meantosafe_mean(utils.py)check_envtocheck_for_correct_spaces(utils.py. Renamed to avoid confusion with environment checker tools)
Moved static function
_is_vectorized_observationfrom common/policies.py to common/utils.py under nameis_vectorized_observation.Removed
{save,load}_running_averagefunctions ofVecNormalizein favor ofload/save.Removed
use_gaeparameter fromRolloutBuffer.compute_returns_and_advantage.
Bug Fixes:
- Fixed
render()method forVecEnvs - Fixed
seed()method forSubprocVecEnv - Fixed loading on GPU for testing when using gSDE and
deterministic=False - Fixed
register_policyto allow re-registering same policy for same sub-class (i.e. assign same value to same key). - Fixed a bug where the gradient was passed when using
gSDEwithPPO/A2C, this does not affectSAC
Others:
- Re-enable unsafe
forkstart method in the tests (was causing a deadlock with tensorflow) - Added a test for seeding
SubprocVecEnvand rendering - Fixed reference in NatureCNN (pointed to older version with different network architecture)
- Fixed comments saying "CxWxH" instead of "CxHxW" (same style as in torch docs / commonly used)
- Added bit further comments on register/getting policies ("MlpPolicy", "CnnPolicy").
- Renamed
progress(value from 1 in start of training to 0 in end) toprogress_remaining. - Added
policies.pyfiles for A2C/PPO, which define MlpPolicy/CnnPolicy (renamed ActorCriticPolicies). - Added some missing tests for
VecNormalize,VecCheckNanandPPO.
Documentation:
- Added a paragraph on "MlpPolicy"/"CnnPolicy" and policy naming scheme under "Developer Guide"
- Fixed second-level listing in changelog
- Python
Published by araffin almost 6 years ago
stable-baselines3 - Tensorboard support, refactored logger
Breaking Changes:
- Remove State-Dependent Exploration (SDE) support for
TD3 - Methods were renamed in the logger:
logkv->record,writekvs->write,writeseq->write_sequence,logkvs->record_dict,dumpkvs->dump,getkvs->get_log_dict,logkv_mean->record_mean,
New Features:
- Added env checker (Sync with Stable Baselines)
- Added
VecCheckNanandVecVideoRecorder(Sync with Stable Baselines) - Added determinism tests
- Added
cmd_utilandatari_wrappers - Added support for
MultiDiscreteandMultiBinaryobservation spaces (@rolandgvc) - Added
MultiCategoricalandBernoullidistributions for PPO/A2C (@rolandgvc) - Added support for logging to tensorboard (@rolandgvc)
- Added
VectorizedActionNoisefor continuous vectorized environments (@PartiallyTyped) - Log evaluation in the
EvalCallbackusing the logger
Bug Fixes:
- Fixed a bug that prevented model trained on cpu to be loaded on gpu
- Fixed version number that had a new line included
- Fixed weird seg fault in docker image due to FakeImageEnv by reducing screen size
- Fixed
sde_sample_freqthat was not taken into account for SAC - Pass logger module to
BaseCallbackotherwise they cannot write in the one used by the algorithms
Others:
- Renamed to Stable-Baseline3
- Added Dockerfile
- Sync
VecEnvswith Stable-Baselines - Update requirement:
gym>=0.17 - Added
.readthedoc.ymlfile - Added
flake8andmake lintcommand - Added Github workflow
- Added warning when passing both
train_freqandn_episodes_rolloutto Off-Policy Algorithms
Documentation:
- Added most documentation (adapted from Stable-Baselines)
- Added link to CONTRIBUTING.md in the README (@kinalmehta)
- Added gSDE project and update docstrings accordingly
- Fix
TD3example code block
- Python
Published by araffin almost 6 years ago