Recent Releases of sheeprl

sheeprl - v0.5.7

v0.5.7 Release Notes

  • Fix policy steps computation for on-policy algorithms in #293

- Python
Published by belerico over 1 year ago

sheeprl - v0.5.6

v0.5.6 Release Notes

  • Fix buffer checkpoint and added the possibility to specify the pre-fill steps upon resuming. Updated the how-tos accordingly in #280
  • Updated how-tos in #281
  • Fix division by zero when computing sps-train in #283
  • Better code naming in #284
  • Fix Minedojo actions stacking (and more generally multi-discrete actions) and missing keys in #286
  • Fix computation of prefill steps as policy steps in #287
  • Fix the Dreamer-V3 imagination notebook in #290
  • Add the ActionsAsObservationWrapper to let the user add the actions played as observations in #291

- Python
Published by belerico over 1 year ago

sheeprl - v0.5.5

v0.5.5 Release Notes

  • Added parallel stochastic in dv3: #225
  • Update dependencies and python version: #230, #262, #263
  • Added dv3 notebook for imagination and obs reconstruction: #232
  • Created citation.cff: #233
  • Added replay ratio for off-policy algorithms: #247
  • Single strategy for the player (now it is instantiated in the build_agent() function: #244, #250, #258
  • Proper terminated and truncated signals management: #251, #252, #253
  • Added the possibility to choose whether or not to learn initial recurrent state: #256
  • Added A2C benchmarks: #266
  • Added prepare_obs() function to all the algorithms: #267
  • Improved code readability: #248, #265
  • bug fix: #220, #222, #224, #231, #243, #255, #257

- Python
Published by michele-milesi almost 2 years ago

sheeprl - v0.5.4

v0.5.4 Release Notes

  • Added Dreamer V3 different sizes configs (#208).
  • Update torch version: 2.2.1 or in 2.0., 2.1..
  • Fix observation normalization in dreamer v3 and p2e_dv3 (#214).
  • Update README (#215).
  • Fix installation and agent evaluation: new commands are made available for agent evaluation, model registration, and for the available agents (#216).

- Python
Published by michele-milesi almost 2 years ago

sheeprl - v0.5.3

v0.5.3 Release Notes

  • Added benchmarks (#185)
  • Added possibility to use a user-defined evaluation file (#199)
  • Let the user choose for num_threads and matmul precision (#203)
  • Added Super Mario Bros Environment (#204)
  • Fix bugs (#183, #186, #193, #195, #200, #201, #202, #205)

- Python
Published by michele-milesi about 2 years ago

sheeprl - v0.5.2

v0.5.2 Release Notes

  • Added A2C algorithm (#33).
  • Added a new how-to on how to add an external algorithm (no need to clone sheeprl locally) in (#175).
  • Added optimizations (#177):
    • Metrics are instantiated only when needed.
    • Removed the torch.cat() operation between empty and dense tensors in the MultiEncoder class.
    • Added possibility not to test the agent after training.
  • Fixed GitHub actions workflow (#180).
  • Fixed bugs (#181, #183).
  • Added benchmarks with respect to StableBaselines3 (#185).
  • Added BernoulliSafeMode distribution, which is a Bernoulli distribution where the mode is computed safely, i.e. it returns self.probs > 0.5 without seeting any NaN (#186) .

- Python
Published by michele-milesi about 2 years ago

sheeprl - v0.5.1

v0.5.1 Release Notes

  • Fix bugs (#174).

- Python
Published by michele-milesi about 2 years ago

sheeprl - v0.5.0

v0.5.0 Release Notes

  • Added Numpy buffers (#169):
    • The user can now decide if to use the torch.as_tensor function or the torch.from_numpy one to convert the Numpy buffer into tensors when sampling (#172).
  • Added optimizations to reduce training time (#168).
  • Added the possibility to keep only the last n checkpoints in an experiment to avoid filling up the disk (#171).
  • Fix bugs (#167).
  • Update documentation.

- Python
Published by michele-milesi about 2 years ago

sheeprl - v0.4.9

v0.4.9 Release Notes

  • Added torch>=2.0 as dependency in #161
  • Let mlflow be an optional package to be installed, i.e. the user can directly install it with pip install sheeprl[mlflow] in #164
  • Fix the resume_from_checkpoint in #163. In particular:
    • Added save_configs function to save the configs of the experiment in the <log_dir>/config.yaml file.
    • Fix the resume from checkpoint of all the algorithms (restart from the correct policy step + fix decoupled).
    • Given more flexibility to p2e finetuning scripts regarding the fabric configs.
    • MineDojo Wrapper: avoid modifying the kwargs (to always save consistent configs in the <log_dir>/config.yaml file).
    • Tensorboar Logger creation: update logger configs to always save consistent configs in the <log_dir>/config.yaml file.
    • Added as_dict() method (to dotdict class) to get a primitive python dictionary from a dotdict object.

- Python
Published by belerico about 2 years ago

sheeprl - v0.4.8

v0.4.8 Release Notes

  • The following config keys have been moved in #158 :
    • cnn_keys, mlp_keys, per_rank_batch_size, per_rank_sequence_length, per_rank_num_batches and total_steps have been moved to the specifig algo config
  • We have added the integration of the MLflowLogger in #159 . This comes with new documentation and notebooks under the example folder on how to use it.

- Python
Published by belerico about 2 years ago

sheeprl - v0.4.7

v0.4.7 Release Notes

  • SheepRL is now on PyPI: every time a release is published, the new version of SheepRL is published also in PyPI (#155)
  • Torchmetrics is no longer installed from the github main branch (#155).
  • Moviepy is no longer installed from the github main branch (#155).
  • box2d-py is not a mandatory dependency anymore, it is possible to install gymnasium[box2d] with the pip install sheeprl[box2d] command (#156)
  • The moviepy.decorators.use_clip_fps_by_default function is replaced (in the ./sheeprl/__init__.py file) with the method in the moviepy main branch (#156).

- Python
Published by michele-milesi about 2 years ago

sheeprl - v0.4.6

v0.4.6 Release Notes

  • The exploration amount of the Dreamer's player has been moved to the Actor in #150
  • All the P2E scripts have been split into exploration and finetuning in #151
  • The hydra version has been fixed to 1.3 in #152
  • SheepRL is now published on PyPi in #155

- Python
Published by belerico about 2 years ago

sheeprl - v0.4.5post0

v0.4.5post0 Release Notes

  • Fixes MineDojo and Dreamer's player in #148

- Python
Published by belerico over 2 years ago

sheeprl - v0.4.5

v0.4.5 Release Notes

  • Added new how-to explaining how to add a new custom environment in #128
  • Added the possibility to completely disable logging metrics and decide what and how to log metrics in every algorithm in #129
  • Fixed the models creation of the Dreamer-V3 agent, where we have removed the bias from every linear layer followed by a LayerNorm and an activation function
  • Added the possibility for the users to specify their own custom configs, possibly inheriting from the already defined sheeprl configs in #132
  • Added the support to Lightning 2.1 in #136
  • Added the possibility to evaluate every agent given a checkpoint in #139 #141
  • Various minor fixes in #125 #133 #134 #135 #137 #140 #143 #144 #145 #146

- Python
Published by belerico over 2 years ago

sheeprl - v0.4.4

v0.4.4 Release Notes

  • Fixes the activation in the recurrent model in DV1 in #110
  • Updated the Diambra wrapper to support the new Diambra package in #111
  • Added dotdict to speedup accessing the loaded config in #112
  • Better naming when hydra creates the output dirs in #114
  • Added the validate_args to decide whether torch.distributions must check the arguments to the __init__ function in; disable it to have a huge speedup in #116
  • Updated Diambra wrapper to support AsynVectorEnv #119
  • Minor fixes #120

- Python
Published by belerico over 2 years ago

sheeprl - v0.4.3

v0.4.3 Release Notes

In this release we have:

  • Fixed the action reset given the done flag in the recurrent PPO implementation
  • Updated the documentation

- Python
Published by belerico over 2 years ago

sheeprl - v0.4.2

v0.4.2 Release Notes

In this release we have:

  • refactored the recurrent PPO implementation. In particular:
    • A single LSTM model is used, taking in input the current observation, the previous played action and the previous recurrent state, i.e., LSTM([o_t, a_t-1], h_t-1). The LSTM has an optional pre-mlp an post-mlp: those can be controlled in the relative algo/ppo_recurrent.yaml config
    • A feature extractor is used to extract features from the observations, being them vectors or images
  • Every PPO algorithm now computes the bootstrapped value, summing it to the current reward, whenever an environment has been truncated

- Python
Published by belerico over 2 years ago

sheeprl - v0.4.0

v0.4.0 Release Notes

In this release we have:

  • made the whole framework single-entryed, i.e. now one can run an experiment just with python sheeprl.py exp=... env=..., removing the need to prepend lightning run model ... sheeprl.py everytime. The Fabric-related configs can be found and changed under the sheeprl/configs/fabric/ folder. (#97)
  • uniformed the make_env vs make_dict_env method, so there is no more distinction between the two. We now assume that the environment has an observation space that is a gymnasium.spaces.Dict, if it is not an exception is raised. (#96)
  • implemented the resume_from_checkpoint for every algorithm. (#95)
  • added the Crafter environment. (#103)
  • Fixed some environments, in particula Diambra and DMC
    • Diambra: renamed the wrapper implementation file; the done flag now checks if info["env_done"] flag is True. (#98)
    • DMC: removed a env.frame_skip=0 for mujoco envs and removed the action repeat from the DMC wrapper. (#99)

- Python
Published by belerico over 2 years ago

sheeprl - v0.3.2

v0.3.2 Release Notes

In this release we have fixed the logging time of every algorithm. In particular:

  • The Time/sps_env_interaction measures the steps-per-second of the environment interaction of the agent, namely the forward to obtain the new action given the observation and the execution of step method of the environment. This value is local to the rank-0 and takes into consideration the action_repeat that one set through hydra/cli
  • The Time/sps_train measures the steps-per-second of the train function that runs in a distributed manner, considering all the ranks calling the train function

- Python
Published by belerico over 2 years ago

sheeprl - v0.3.1

v0.3.1 Release Notes

In this release we have refactored some names inside every algorithm, in particular:

  • we have introduced the concept of policy_step, which is the number of (distributed) policy steps per environment step, where the environment step does not take into consideration the action repeat, i.e. is the number of times the policy is called to collect an action given an observation. If one has n ranks and m environments per rank, then the number of policy steps per environment step is policy_steps = n * m

We have also refactored the hydra configs, in particular:

  • we have introduced both the metric, checkpoint and buffer config, containing the shared hyperparameters for those objects in every algorithm
  • the metric config has the metric.log_every parameter, which controls the logging frequency. Since it's hard for the policy_step variable to be divisible by the metric.log_every value, the logging will happen as soon as policy_step - last_log >= cfg.metric.log_every, with last_log = policy_step is updated everytime something is logged
  • the checkpoint has the every and resume_from parameters. The every parameter works as the metric.log_every one, while the resume_from specifies the experiment folder, which must contain the .hydra folder, to resume the training from. This is now only supported by the Dreamer algorithms
  • num_envs and clip_reward have been moved to the env config

- Python
Published by belerico over 2 years ago

sheeprl - v0.3.0

v0.3.0 Release Notes

This new release introduces hydra as the default configuration manager. In particular it fixes #74 and automatically #75, since now the cnn_keys and mlp_keys can be specified separately for both the encoder and decoder.
The changes are mainly the following:

  • Dreamer-V3 initialization follows directly Hafner's implementation (adapted from https://github.com/NM512/dreamerv3-torch/blob/main/tools.py)
  • all args.py and the HFArgumentParser have been removed. Configs are now specified under the sheeprl/configs folder and hydra is the default configuration manager
  • Every environment wrapper is directly instantiated through the hydra.utils.instantiate inside the make_env or make_dict_env method: in this way one can easily modify the wrapper passing whatever parameters to customize the env. Every wrapper must take as input the id parameter, which must be specified in the relative config
  • Every optimizer is directly instantiated through the hydra.utils.instantiate and can be modified through the CLI on the experiment run
  • The howto/configs.md has been added in which explain how the configs are organized inside the repo

- Python
Published by belerico over 2 years ago

sheeprl - v0.2.2

v0.2.2 Release Notes

  • Fixed Dreamer-V3 test function: it uses its own instead of the Dreamer-V2 ones
  • Added ruff in pre-commit and add pre-commit.ci

- Python
Published by belerico over 2 years ago

sheeprl - v0.2.1

v0.2.1 Release Notes

  • Added Dreamer-V3 algorithm from https://arxiv.org/abs/2301.04104
  • Added RestartOnException wrapper, which recreates and restarts the environments whenever somethig bad has happened during the step or reset. This has been added only on Dreamer-V3 algorithm
  • Renamed classes and functions (in particular the Player classes fro both Dreamer-V1/V2)

- Python
Published by belerico over 2 years ago

sheeprl - v0.2

v0.2 Release notes

  • Added DiambraWrapper
  • Added Multi-encoder/decoder to all the algorithms, but Droq, Sac and PPO Recurrent
  • Added Multi-discrete support to PPO, DreamerV1 and P2E-DV1
  • Modified the make_env function to be able to train the agents on environments that return both pixel-like and vector-like observations
  • Modified the ReplayBuffer class to handle multiple observations
  • Updated howtos
  • Fixed #66
  • Logger creation is moved to sheeprl.utils.logger
  • Env creation is moved to sheeprl.utils.env
  • PPO algo is now a single-folder algorithm (removed ppo_pixel and ppo_continuous folder)
  • sac_pixel has been renamed to sac_ae
  • Added support to gymnasium==0.29.0, mujoco>=2.3.3 and dm_control>=1.0.12

- Python
Published by belerico over 2 years ago

sheeprl - v0.1

v0.1 Release notes

Algorithms implemented: * Dreamer-V1 (https://arxiv.org/abs/1912.01603) * Dreamer-V2 (https://arxiv.org/abs/2010.02193) * Plan2Explore Dreamer-V1-based (https://arxiv.org/abs/2005.05960) * Plan2Explore Dreamer-V2-based (https://arxiv.org/abs/2005.05960) * DroQ (https://arxiv.org/abs/2110.02034) * PPO (https://arxiv.org/abs/1707.06347) * PPO Recurrent (https://arxiv.org/abs/2205.11104) * SAC (https://arxiv.org/abs/1812.05905) * SAC-AE (https://arxiv.org/abs/1910.01741)

- Python
Published by belerico over 2 years ago