Recent Releases of sheeprl
sheeprl - v0.5.6
v0.5.6 Release Notes
- Fix buffer checkpoint and added the possibility to specify the pre-fill steps upon resuming. Updated the how-tos accordingly in #280
- Updated how-tos in #281
- Fix division by zero when computing sps-train in #283
- Better code naming in #284
- Fix Minedojo actions stacking (and more generally multi-discrete actions) and missing keys in #286
- Fix computation of prefill steps as policy steps in #287
- Fix the Dreamer-V3 imagination notebook in #290
- Add the
ActionsAsObservationWrapperto let the user add the actions played as observations in #291
- Python
Published by belerico over 1 year ago
sheeprl - v0.5.5
v0.5.5 Release Notes
- Added parallel stochastic in dv3: #225
- Update dependencies and python version: #230, #262, #263
- Added dv3 notebook for imagination and obs reconstruction: #232
- Created citation.cff: #233
- Added replay ratio for off-policy algorithms: #247
- Single strategy for the player (now it is instantiated in the
build_agent()function: #244, #250, #258 - Proper
terminatedandtruncatedsignals management: #251, #252, #253 - Added the possibility to choose whether or not to learn initial recurrent state: #256
- Added A2C benchmarks: #266
- Added
prepare_obs()function to all the algorithms: #267 - Improved code readability: #248, #265
- bug fix: #220, #222, #224, #231, #243, #255, #257
- Python
Published by michele-milesi almost 2 years ago
sheeprl - v0.5.4
v0.5.4 Release Notes
- Added Dreamer V3 different sizes configs (#208).
- Update torch version: 2.2.1 or in 2.0., 2.1..
- Fix observation normalization in dreamer v3 and p2e_dv3 (#214).
- Update README (#215).
- Fix installation and agent evaluation: new commands are made available for agent evaluation, model registration, and for the available agents (#216).
- Python
Published by michele-milesi almost 2 years ago
sheeprl - v0.5.3
v0.5.3 Release Notes
- Added benchmarks (#185)
- Added possibility to use a user-defined evaluation file (#199)
- Let the user choose for num_threads and matmul precision (#203)
- Added Super Mario Bros Environment (#204)
- Fix bugs (#183, #186, #193, #195, #200, #201, #202, #205)
- Python
Published by michele-milesi about 2 years ago
sheeprl - v0.5.2
v0.5.2 Release Notes
- Added A2C algorithm (#33).
- Added a new how-to on how to add an external algorithm (no need to clone sheeprl locally) in (#175).
- Added optimizations (#177):
- Metrics are instantiated only when needed.
- Removed the
torch.cat()operation between empty and dense tensors in theMultiEncoderclass. - Added possibility not to test the agent after training.
- Fixed GitHub actions workflow (#180).
- Fixed bugs (#181, #183).
- Added benchmarks with respect to StableBaselines3 (#185).
- Added
BernoulliSafeModedistribution, which is a Bernoulli distribution where the mode is computed safely, i.e. it returnsself.probs > 0.5without seeting any NaN (#186) .
- Python
Published by michele-milesi about 2 years ago
sheeprl - v0.5.0
v0.5.0 Release Notes
- Added Numpy buffers (#169):
- The user can now decide if to use the
torch.as_tensorfunction or thetorch.from_numpyone to convert the Numpy buffer into tensors when sampling (#172).
- The user can now decide if to use the
- Added optimizations to reduce training time (#168).
- Added the possibility to keep only the last
ncheckpoints in an experiment to avoid filling up the disk (#171). - Fix bugs (#167).
- Update documentation.
- Python
Published by michele-milesi about 2 years ago
sheeprl - v0.4.9
v0.4.9 Release Notes
- Added
torch>=2.0as dependency in #161 - Let
mlflowbe an optional package to be installed, i.e. the user can directly install it withpip install sheeprl[mlflow]in #164 - Fix the
resume_from_checkpointin #163. In particular:- Added
save_configsfunction to save the configs of the experiment in the<log_dir>/config.yamlfile. - Fix the
resume from checkpointof all the algorithms (restart from the correct policy step + fix decoupled). - Given more flexibility to p2e finetuning scripts regarding the fabric configs.
- MineDojo Wrapper: avoid modifying the kwargs (to always save consistent configs in the
<log_dir>/config.yamlfile). - Tensorboar Logger creation: update logger configs to always save consistent configs in the
<log_dir>/config.yamlfile. - Added
as_dict()method (todotdictclass) to get a primitive python dictionary from adotdictobject.
- Added
- Python
Published by belerico about 2 years ago
sheeprl - v0.4.8
v0.4.8 Release Notes
- The following config keys have been moved in #158 :
cnn_keys,mlp_keys,per_rank_batch_size,per_rank_sequence_length,per_rank_num_batchesandtotal_stepshave been moved to the specifigalgoconfig
- We have added the integration of the MLflowLogger in #159 . This comes with new documentation and notebooks under the
examplefolder on how to use it.
- Python
Published by belerico about 2 years ago
sheeprl - v0.4.7
v0.4.7 Release Notes
- SheepRL is now on PyPI: every time a release is published, the new version of SheepRL is published also in PyPI (#155)
- Torchmetrics is no longer installed from the github main branch (#155).
- Moviepy is no longer installed from the github main branch (#155).
- box2d-py is not a mandatory dependency anymore, it is possible to install
gymnasium[box2d]with thepip install sheeprl[box2d]command (#156) - The
moviepy.decorators.use_clip_fps_by_defaultfunction is replaced (in the./sheeprl/__init__.pyfile) with the method in the moviepy main branch (#156).
- Python
Published by michele-milesi about 2 years ago
sheeprl - v0.4.6
v0.4.6 Release Notes
- The exploration amount of the Dreamer's player has been moved to the Actor in #150
- All the P2E scripts have been split into
explorationandfinetuningin #151 - The hydra version has been fixed to
1.3in #152 - SheepRL is now published on PyPi in #155
- Python
Published by belerico about 2 years ago
sheeprl - v0.4.5post0
v0.4.5post0 Release Notes
- Fixes MineDojo and Dreamer's player in #148
- Python
Published by belerico over 2 years ago
sheeprl - v0.4.5
v0.4.5 Release Notes
- Added new how-to explaining how to add a new custom environment in #128
- Added the possibility to completely disable logging metrics and decide what and how to log metrics in every algorithm in #129
- Fixed the models creation of the Dreamer-V3 agent, where we have removed the bias from every linear layer followed by a LayerNorm and an activation function
- Added the possibility for the users to specify their own custom configs, possibly inheriting from the already defined sheeprl configs in #132
- Added the support to Lightning 2.1 in #136
- Added the possibility to evaluate every agent given a checkpoint in #139 #141
- Various minor fixes in #125 #133 #134 #135 #137 #140 #143 #144 #145 #146
- Python
Published by belerico over 2 years ago
sheeprl - v0.4.4
v0.4.4 Release Notes
- Fixes the activation in the recurrent model in DV1 in #110
- Updated the Diambra wrapper to support the new Diambra package in #111
- Added
dotdictto speedup accessing the loaded config in #112 - Better naming when hydra creates the output dirs in #114
- Added the
validate_argsto decide whethertorch.distributionsmust check the arguments to the__init__function in; disable it to have a huge speedup in #116 - Updated Diambra wrapper to support
AsynVectorEnv#119 - Minor fixes #120
- Python
Published by belerico over 2 years ago
sheeprl - v0.4.2
v0.4.2 Release Notes
In this release we have:
- refactored the recurrent PPO implementation. In particular:
- A single LSTM model is used, taking in input the current observation, the previous played action and the previous recurrent state, i.e.,
LSTM([o_t, a_t-1], h_t-1). The LSTM has an optional pre-mlp an post-mlp: those can be controlled in the relativealgo/ppo_recurrent.yamlconfig - A feature extractor is used to extract features from the observations, being them vectors or images
- A single LSTM model is used, taking in input the current observation, the previous played action and the previous recurrent state, i.e.,
- Every PPO algorithm now computes the bootstrapped value, summing it to the current reward, whenever an environment has been truncated
- Python
Published by belerico over 2 years ago
sheeprl - v0.4.0
v0.4.0 Release Notes
In this release we have:
- made the whole framework single-entryed, i.e. now one can run an experiment just with
python sheeprl.py exp=... env=..., removing the need to prependlightning run model ... sheeprl.pyeverytime. The Fabric-related configs can be found and changed under thesheeprl/configs/fabric/folder. (#97) - uniformed the
make_envvsmake_dict_envmethod, so there is no more distinction between the two. We now assume that the environment has an observation space that is agymnasium.spaces.Dict, if it is not an exception is raised. (#96) - implemented the
resume_from_checkpointfor every algorithm. (#95) - added the Crafter environment. (#103)
- Fixed some environments, in particula Diambra and DMC
- Diambra: renamed the wrapper implementation file; the done flag now checks if info["env_done"] flag is True. (#98)
- DMC: removed a
env.frame_skip=0for mujoco envs and removed the action repeat from the DMC wrapper. (#99)
- Python
Published by belerico over 2 years ago
sheeprl - v0.3.2
v0.3.2 Release Notes
In this release we have fixed the logging time of every algorithm. In particular:
- The
Time/sps_env_interactionmeasures the steps-per-second of the environment interaction of the agent, namely the forward to obtain the new action given the observation and the execution ofstepmethod of the environment. This value is local to the rank-0 and takes into consideration theaction_repeatthat one set through hydra/cli - The
Time/sps_trainmeasures the steps-per-second of the train function that runs in a distributed manner, considering all the ranks calling the train function
- Python
Published by belerico over 2 years ago
sheeprl - v0.3.1
v0.3.1 Release Notes
In this release we have refactored some names inside every algorithm, in particular:
- we have introduced the concept of
policy_step, which is the number of (distributed) policy steps per environment step, where the environment step does not take into consideration the action repeat, i.e. is the number of times the policy is called to collect an action given an observation. If one hasnranks andmenvironments per rank, then the number of policy steps per environment step ispolicy_steps = n * m
We have also refactored the hydra configs, in particular:
- we have introduced both the
metric,checkpointandbufferconfig, containing the shared hyperparameters for those objects in every algorithm - the
metricconfig has themetric.log_everyparameter, which controls the logging frequency. Since it's hard for thepolicy_stepvariable to be divisible by themetric.log_everyvalue, the logging will happen as soon aspolicy_step - last_log >= cfg.metric.log_every, withlast_log = policy_stepis updated everytime something is logged - the
checkpointhas theeveryandresume_fromparameters. Theeveryparameter works as themetric.log_everyone, while theresume_fromspecifies the experiment folder, which must contain the.hydrafolder, to resume the training from. This is now only supported by the Dreamer algorithms num_envsandclip_rewardhave been moved to theenvconfig
- Python
Published by belerico over 2 years ago
sheeprl - v0.3.0
v0.3.0 Release Notes
This new release introduces hydra as the default configuration manager. In particular it fixes #74 and automatically #75, since now the cnn_keys and mlp_keys can be specified separately for both the encoder and decoder.
The changes are mainly the following:
- Dreamer-V3 initialization follows directly Hafner's implementation (adapted from https://github.com/NM512/dreamerv3-torch/blob/main/tools.py)
- all
args.pyand theHFArgumentParserhave been removed. Configs are now specified under thesheeprl/configsfolder and hydra is the default configuration manager - Every environment wrapper is directly instantiated through the
hydra.utils.instantiateinside themake_envormake_dict_envmethod: in this way one can easily modify the wrapper passing whatever parameters to customize the env. Every wrapper must take as input theidparameter, which must be specified in the relative config - Every optimizer is directly instantiated through the
hydra.utils.instantiateand can be modified through the CLI on the experiment run - The
howto/configs.mdhas been added in which explain how the configs are organized inside the repo
- Python
Published by belerico over 2 years ago
sheeprl - v0.2.1
v0.2.1 Release Notes
- Added Dreamer-V3 algorithm from https://arxiv.org/abs/2301.04104
- Added
RestartOnExceptionwrapper, which recreates and restarts the environments whenever somethig bad has happened during thesteporreset. This has been added only on Dreamer-V3 algorithm - Renamed classes and functions (in particular the
Playerclasses fro both Dreamer-V1/V2)
- Python
Published by belerico over 2 years ago
sheeprl - v0.2
v0.2 Release notes
- Added DiambraWrapper
- Added Multi-encoder/decoder to all the algorithms, but Droq, Sac and PPO Recurrent
- Added Multi-discrete support to PPO, DreamerV1 and P2E-DV1
- Modified the make_env function to be able to train the agents on environments that return both pixel-like and vector-like observations
- Modified the ReplayBuffer class to handle multiple observations
- Updated howtos
- Fixed #66
- Logger creation is moved to
sheeprl.utils.logger - Env creation is moved to
sheeprl.utils.env - PPO algo is now a single-folder algorithm (removed
ppo_pixelandppo_continuousfolder) sac_pixelhas been renamed tosac_ae- Added support to
gymnasium==0.29.0,mujoco>=2.3.3anddm_control>=1.0.12
- Python
Published by belerico over 2 years ago
sheeprl - v0.1
v0.1 Release notes
Algorithms implemented: * Dreamer-V1 (https://arxiv.org/abs/1912.01603) * Dreamer-V2 (https://arxiv.org/abs/2010.02193) * Plan2Explore Dreamer-V1-based (https://arxiv.org/abs/2005.05960) * Plan2Explore Dreamer-V2-based (https://arxiv.org/abs/2005.05960) * DroQ (https://arxiv.org/abs/2110.02034) * PPO (https://arxiv.org/abs/1707.06347) * PPO Recurrent (https://arxiv.org/abs/2205.11104) * SAC (https://arxiv.org/abs/1812.05905) * SAC-AE (https://arxiv.org/abs/1910.01741)
- Python
Published by belerico over 2 years ago