agilerl - v2.3.3: Bug Fixes & Refactoring (PPO, GRPO, RolloutBuffer, EvolvableCNN)

Bug Fixes

Cast observations and actions to torch.float32 in RolloutBuffer to ensure proper handling of all observation and action space combinations.
Use evaluate_actions() in new learn() methods that make use of RolloutBuffer in PPO to ensure observation preprocessing during evaluation.
Add recurrent hidden states handling in evaluate_actions() (contributed by @brieyla1).
Ignore type instances in evolvable attribute check to ensure net_config isn't identified as such when passing a custom encoder class (contributed by @brieyla1)
Cast passed kernel size to int in change_kernel() mutation in EvolvableCNN.
In agilerl.training.train_llm.finetune_llm removed '+1' from within agent.set_reference_policy(env.num_dataset_passes + 1) to prevent unnecessary reference policy reset at the start of training

What's Changed

[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/AgileRL/AgileRL/pull/410
Nightly bugfixes and coverage improvements by @nicku-a in https://github.com/AgileRL/AgileRL/pull/418
Bug Fixes & Refactoring (PPO, GRPO, RolloutBuffer, EvolvableCNN) by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/423

Full Changelog: https://github.com/AgileRL/AgileRL/compare/v2.3.1...v2.3.3

- Python
Published by jaimesabalbermudez 7 months ago

agilerl - v2.3.1: On-Policy & AgentWrapper Bug Fixes

Bug Fixes

Implement __getstate__ and __setstate__ in AgentWrapper to correctly set wrapped methods when serializing.
Bug fix preventing architecture mutations in agents wrapped with AgentWrapper.
Add random_seed to argument to evolvable modules that didn't include it.
Generalize on-policy training loops to support any name for the policy (before it assumed actor and actors for single- and multi-agent algos, respectively).
Move reinit_optimizers() into EvolvableAlgorithm instead of it being a method of Mutations.
Bug fixes for integration of PPO with use_rollout_buffer=True implementation with train_on_policy().
Saving and loading checkpoints with methods save_checkpoint() and load_checkpoint() added to the GRPO algorithm.

What's Changed

Fix multi-agent tutorials and README by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/404
Docs fixes and README by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/405
Not run pytest if agilerl was not changed by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/406
Grpo checkpoint by @mikepratt1 in https://github.com/AgileRL/AgileRL/pull/413
Docs and AgentWrapper & PPO Bug Fixes by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/412

Full Changelog: https://github.com/AgileRL/AgileRL/compare/v2.3.0...v2.3.1

- Python
Published by jaimesabalbermudez 7 months ago

agilerl - v2.3.0: Recurrent PPO, Generalised MARL, and More!

Features

Enhace PPO with recurrent policy support to solve POMDPs https://github.com/AgileRL/AgileRL/pull/373. New implementation makes use of a RolloutBuffer to collect rollouts (and optionally recurrent hidden states) throughout training. Includes new implementation of EvolvableDistribution used by StochasticActor with reduced computational overhead. Thank you to @brieyla1 and @ali-shihab from Warburg AI for this contribution!
Generalised MARL algorithms https://github.com/AgileRL/AgileRL/pull/386. Support training on any combination of observation spaces for different agents in a MARL problem by using EvolvableMultiInput for centralized critics (in e.g. MADDPG and MATD3). Allow specifying network configurations for groups of agents that share the same observation space directly, or for individual sub-agents.
GRPO memory optimizations https://github.com/AgileRL/AgileRL/pull/397.
Added AsyncAgentsWrapper to handle non-simulatenously stepping agents in MARL. Only supported for IPPO for now.
Added support for complex spaces in IPPO.

Bug Fixes

Bug fix with EvolvableNetwork protocol https://github.com/AgileRL/AgileRL/issues/371.
Bug fix in train_llm() https://github.com/AgileRL/AgileRL/pull/399
Bug fix train_multi_agent_off_policy() when using sum_scores=False https://github.com/AgileRL/AgileRL/issues/348

Tests

Refactored tests by adding session fixtures to conftest.py
Removed redundant tests that added a lot of overhead to test times -> reduced number of tests from around 3200 to around 2600 while reducing test times from ~2hrs to ~1hr.

Documentation

Added detailed explanation on how evolutionary hyperparemter optimisation is performed in AgileRL.
Better documentation for MARL support in AgileRL and how network configurations can be specified in an algorithm.
Added tutorial to solve Pendulum-v1 with masked angular velocities that shows how to use AgileRL to solve POMDPs with a recurrent neural network (currently only supported in PPO)

What's Changed

[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/AgileRL/AgileRL/pull/369
Fix NeuralUCB tutorial: add missing replay buffer usage and correct plot label by @OnlyTsukii in https://github.com/AgileRL/AgileRL/pull/379
Version updates by @mikepratt1 in https://github.com/AgileRL/AgileRL/pull/385
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/AgileRL/AgileRL/pull/383
Docs fix by @mikepratt1 in https://github.com/AgileRL/AgileRL/pull/387
Bug fix load on no-cuda device by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/391
Generalised Multi-Agent Algorithms by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/386
Tests Refactoring & Optimizations by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/393
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/AgileRL/AgileRL/pull/392
Enhance PPO with Recurrent Policy Support, Rollout Buffer, and BPTT, Faster Distributions, Performance tools, & goodies by @brieyla1 in https://github.com/AgileRL/AgileRL/pull/373
Add kwargs arguments to initwandb() & trainX_policy() util funcs by @JonDum in https://github.com/AgileRL/AgileRL/pull/355
Bump transformers from 4.48.1 to 4.50.0 by @dependabot in https://github.com/AgileRL/AgileRL/pull/382
Advanced CodeQL by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/395
Network Bug Fixes by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/370
More grpo memory optimizations by @mikepratt1 in https://github.com/AgileRL/AgileRL/pull/397
Train llm bug fix by @mikepratt1 in https://github.com/AgileRL/AgileRL/pull/399
Recurrent PPO Documentation & Tutorial by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/398

New Contributors

@OnlyTsukii made their first contribution in https://github.com/AgileRL/AgileRL/pull/379
@brieyla1 and @ali-shihab made their first contribution in https://github.com/AgileRL/AgileRL/pull/373

Full Changelog: https://github.com/AgileRL/AgileRL/compare/v2.2.8...v2.3.0

- Python
Published by jaimesabalbermudez 8 months ago

agilerl - v2.2.8 GRPO Optimizations

What's Changed

[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/AgileRL/AgileRL/pull/332
Bug fix modules() for EvolvableDistribution by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/368
GRPO memory optimization by @mikepratt1 in https://github.com/AgileRL/AgileRL/pull/372

Full Changelog: https://github.com/AgileRL/AgileRL/compare/v2.2.5...v2.2.8

- Python
Published by nicku-a 10 months ago

agilerl - v2.2.5: AsyncPettingZooVecEnv Refactor

What's Changed

Refactor PzAsyncVecEnv by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/367

Full Changelog: https://github.com/AgileRL/AgileRL/compare/v2.2.4...v2.2.5

- Python
Published by jaimesabalbermudez 10 months ago

agilerl - v2.2.4: GRPO Evo-HPO Updates

What's Changed

GRPO Evo-HPO fixes by @mikepratt1 in https://github.com/AgileRL/AgileRL/pull/364
Refactored algorithm tests for a more extense and simple coverage, fix OOM issues

Full Changelog: https://github.com/AgileRL/AgileRL/compare/v2.2.3...v2.2.4

- Python
Published by jaimesabalbermudez 10 months ago

agilerl - v2.2.3: Support for Asynchronous Agents in IPPO

What's Changed

Handle arrays applyimagenormalization by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/362
IPPO Asynchronous Agents by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/363

Full Changelog: https://github.com/AgileRL/AgileRL/compare/v2.2.2...v2.2.3

- Python
Published by jaimesabalbermudez 10 months ago

agilerl - v2.2.2: IPPO Complex Spaces & Bug Fixes

Bug Fixes

Issue with training DQN on spaces.Tuple observations.
Issue with training on spaces.MultiBinary observations generally.
TD3 and DDPG get_action() was returning torch.Tensor's instead of np.ndarray.
Add support for complex spaces in IPPO.
Clip actions in single and multi-agent on-policy training loops.
Test for all observation spaces get_action()
Bug fix StochasticActor with logstd not being saved in statedict

What's Changed

IPPO Complex Spaces & Bug Fixes by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/361

Full Changelog: https://github.com/AgileRL/AgileRL/compare/v2.2.1...v2.2.2

- Python
Published by jaimesabalbermudez 11 months ago

agilerl - v2.2.1: Multi-Agent Bug Fixes

What's Changed

Update docs tutorials by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/351
Fix "hyperparamer" typo in Off Policy example by @JonDum in https://github.com/AgileRL/AgileRL/pull/352
Bug fixes multi-agent off-policy & support for MultiBinary observations by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/357

New Contributors

@JonDum made their first contribution in https://github.com/AgileRL/AgileRL/pull/352

Full Changelog: https://github.com/AgileRL/AgileRL/compare/v2.2.0...v2.2.1

- Python
Published by jaimesabalbermudez 11 months ago

agilerl - v2.2.0: Hyperparameter Optimization on GRPO, IPPO, EvolvableLSTM, MultiDiscrete Actions & More!

Features:

Evolutionary HPO on GRPO: Support performing automatic hyperparameter tuning on a population of GRPO agents. Limited to mutating RL hyperparameters only for now.
Independent Proximal Policy Optimization (IPPO): On-policy multi-agent algorithm that allows optimizing homogeneous agents with a single network. We identify homogeneous agents through a common prefix in their agent IDs.
MultiDiscrete & MultiBinary Action Spaces: AgileRL now supports these spaces in StochasticActor, used in on-policy algorithms such as PPO https://github.com/AgileRL/AgileRL/issues/341.
New Buffers: Implemented ReplayBuffer, PrioritizedReplayBuffer, and MultiStepReplayBuffer using TensorDict's as storage. This scales much better than the deque and will allow us to further abstract different aspects of the training pipeline in the future https://github.com/AgileRL/AgileRL/issues/315.
EvolvableLSTM: Module that can be used with 2D Box spaces. Well integrated in EvolvableNetwork objects and EvolvableMultiInput https://github.com/AgileRL/AgileRL/issues/320.
Improved EvolvableMultiInput: Integrated new EvolvableLSTM and give option to also flatten 2D Box space observations and treat as vectors https://github.com/AgileRL/AgileRL/issues/321.
Sharing Encoders: Use share_encoders=True in PPO, DDPG, and TD3 to automatically share the encoders between actor and critic/s. This reduces a lot of computation overhead, specially in complex environments that require high-capacity networks https://github.com/AgileRL/AgileRL/issues/314.

Tests:

More coverage in algo_utils.py.
Better tests for OptimizerWrapper and EvolvableAlgorithm

Breaking Changes:

Refactored EvolvableMultiInput to have a simpler API. We now pass in a cnn_config, mlp_config, and lstm_config separately rather than "flattening" their arguments into its constructor.
Single-agent off-policy replay buffers have a simpler API, there's no need to provide the "field_names" to a ReplayBuffer since these are automatically inspected upon adding the first transition.

Bug Fixes:

Issue with dictionary and tuple spaces in multi-agent settings.
Bug when using PPO on continuous action spaces.

What's Changed

IPPO by @nicku-a in https://github.com/AgileRL/AgileRL/pull/343
Grpo by @mikepratt1 in https://github.com/AgileRL/AgileRL/pull/342
TensorDict ReplayBuffer & EvolvableLSTM by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/346
Lambda use in Pz Async Vec Envs by @nicku-a in https://github.com/AgileRL/AgileRL/pull/350
Support for MultiDiscrete & MultiBinary Action Spaces by @jaimesabalbermudez in https://github.com/AgileRL/AgileRL/pull/349

Full Changelog: https://github.com/AgileRL/AgileRL/compare/v2.1.2...v2.2.0

- Python
Published by jaimesabalbermudez 11 months ago

agilerl - v2.1.2 Bug Fixes & Improvements

What's Changed

PR: https://github.com/AgileRL/AgileRL/pull/338
Bug fix when using ContinuousQNetwork with layer_norm=True where statistically inconsistency between raw actions and normalized observation encodings caused instability during training and worse performance in multi-agent algorithms.
Bug fix in EvolvableMultiInput where Box spaces with shape=() raised an error.
Bug fix in load() method of EvolvableAlgorithm that caused issues when loading models from >=2.0.0 and <=2.0.6 on later versions.

Full Changelog: https://github.com/AgileRL/AgileRL/compare/v2.1.1...v2.1.2

- Python
Published by jaimesabalbermudez 11 months ago

agilerl - v2.1.1 GRPO and Advanced Evolvable Architectures 🧠 🦁

AgileRL v2.1.1 introduces several additional features to the AgileRL framework, including support for RL finetuning of LLMs and new evolvable architectures!

This release includes: - Distributed GRPO - The algorithm introduced by DeepSeek is now available in AgileRL, providing the functionality to use RL to finetune LLMs across multiple GPUs to create more specialized agents. - We have implemented an Evolvable version of the SimBa network (EvolvableSimba), which improves sample efficiency and beats existing SOTA deep RL methods. SimBa consists of three components: - An observation normalization layer that standardizes inputs with running statistics - A residual feedforward block to provide a linear pathway from the input to the output - A layer normalization to control feature magnitudes - Similarly, we have introduced an EvolvableResNet to offer superior performance for image based observation spaces. - Multi-agent bug fixes - Complex spaces bug fixes

- Python
Published by mikepratt1 11 months ago

agilerl - AgileRL 2.0.0

AgileRL 2.0

Agilerl 2.0 is here, offering a ton of new features and updates to the framework!

The main focus of this release is to provide a more flexible framework for creating custom evolvable network architectures and algorithms to make the most out of automatic evolutionary hyperparameter optimization during training. We've also done some heavy refactoring to make the codebase more modular and scalable, with the hope that users find it easier to plug-and-play with their arbitrarily complex use-cases.

Features:

Support for Dictionary / Tuple Spaces: We have implemented the EvolvableMultiInput module, which takes in a (single-level) dictionary or tuple space and assigns an EvolvableCNN to each underlying image subspace. Observations from vector / discrete spaces are simply concatenated to the image encodings by default, but users can specify if they want these to be processed by an EvolvableMLP before concatenating.
EvolvableModule Class Hierarchy: A wrapper around nn.Module that allows us to keep track of the mutation methods in complex networks with nested modules. We use the @mutation decorator to signal mutation methods and these are registered automatically as such. Such modules should implement a :meth:recreate_network() <agilerl.modules.base.EvolvableModule.recreate_network> method that is called automatically after any mutation method is used to modify the network's architecture. Users can now pass in non-evolvable architectures to the algorithms too by wrapping their models with DummyEvolvable. This is useful when you want to use a pre-trained model or a model whose architecture you don't want to mutate, while still enabling random weight and RL hyperparameter mutations. Please refer to the documentation for more information.
EvolvableNetwork Class Hierarchy: Towards a more general API for algorithm implementation, where complex observation spaces should be inherently supported, networks inheriting from EvolvableNetwork automatically create an appropriate encoder from a given observation space. Custom networks simply have to specify the head to the network that maps the observation encodings to a number of outputs. As part of this update we implement the following common networks used (by default) in the already implemented algorithms.
- QNetwork: State-action value function (used in e.g. DQN).
- RainbowQNetwork: State-action value function that uses a dueling distributional architecture for the network head (used in Rainbow DQN).
- ContinuousQNetwork: State-action value function for continuous action spaces, which takes the actions as input with the observations.
- ValueNetwork: Outputs the scalar value of an observation (used in e.g. PPO).
- DeterministicActor: Outputs deterministic actions given an action space.
- StochasticActor: Outputs an appropriate PyTorch distribution over the given action space.
EvolvableAlgorithm Class Hierarchy: We create a class hierarchy for algorithms with a focus on evolutionary hyperparameter optimization. The EvolvableAlgorithm base class implements common methods across any RL algorithm e.g. save_checkpoint(), load(), but also methods pertaining specifically to mutations e.g. clone(). Under-the-hood, it initializes a MutationRegistry that users should use to register "network groups". The registry also keeps track of the RL hyperparameters users wish to mutate during training and the optimizers. Users wishing to create custom algorithms should now only need to worry about implementing get_action(), learn(), and (for now) test() methods.
Generalized Mutations: We have refactored Mutations with the above hierarchies in mind to allow for a generalised mutations framework that works for any combination of evolvable networks in an algorithm. Moreover, we now allow users to pass in any configuration of RL hyperparameters they wish to mutate during training directly to an algorithm inheriting from EvolvableAlgorithm, rather than handling this in Mutations. For an example of how to do this, please refer to the documentation of any of the algorithms implemented in AgileRL, or our tutorials.

Breaking Changes:

We have placed the building blocks of our networks in a dedicated :mod:agilerl.modules module, which contains the off-the-shelf evolvable modules that can be used to create custom network architectures (e.g. EvolvableMLP, EvolvableCNN, and EvolvableMultiInput), whereas before these were located in agilerl.networks. In the latter we now keep networks created through the EvolvableNetwork class hierarchy.
Pass in observation_space and action_space to the algorithms instead of state_dim and action_dim. This is to support more complex observation spaces, and allow for a simpler generation of default networks in the algorithms by using the EvolvableNetwork class hierarchy.
Simplified API in the evolvable modules, mutations, and algorithms. Please refer to the documentation for more information.
net_config argument of algorithms should now be passed in with the arguments of the corresponding EvolvableNetwork class. For example, in PPO, the net_config argument might include an "encoderconfig" key which is different depending on your observation space, and a "headconfig" key for the head of the actor (i.e. StochasticActor) and critic (i.e. ValueNetwork). All the networks in an algorithm are initialized with the same architecture by default. If users with to use different architectures, these should be passed as arguments directly to the algorithm.

Example Network Configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

netconfig = { # For an image observation space we encode observations using EvolvableCNN "encoderconfig": { "channelsize": [32], "kernelsize": [3], "stride_size": [1], }

  # The head is usually an EvolvableMLP by default
  "head_config": {
      "hidden_size": [64, 64],
  }

}

- Python
Published by jaimesabalbermudez about 1 year ago

agilerl - v1.0.0 AgileRL

🎉🤖 AgileRL version 1.0.0 🎉🤖

This release marks v1.0.0 of the framework! Thanks to all our current users and collaborators who have helped us get so far.

v1 contains major updates including new trainers, more controls, better docs, updated variable and function names, and more!

AgileRL is a Deep Reinforcement Learning library focused on improving development by introducing RLOps - MLOps for reinforcement learning.

This library is initially focused on reducing the time taken for training models and hyperparameter optimization (HPO) by pioneering evolutionary HPO techniques for reinforcement learning. Evolutionary HPO has been shown to drastically reduce overall training times by automatically converging on optimal hyperparameters, without requiring numerous training runs.

We are constantly adding more algorithms and features. AgileRL already includes state-of-the-art evolvable on-policy, off-policy, offline, multi-agent and contextual multi-armed bandit reinforcement learning algorithms with distributed training.

To see the full AgileRL documentation, including tutorials, visit our documentation site. To ask questions and get help, collaborate, or discuss anything related to reinforcement learning, join the AgileRL Discord Server.

- Python
Published by nicku-a over 1 year ago

agilerl - v0.1.21 Contextual Multi-armed Bandits 🎰🥷

AgileRL v0.1.21 introduces contextual multi-armed bandit algorithms to the framework. Train agents to solve complex optimisation problems with our two new evolvable bandit algorithms!

This release includes the following updates:

Two new evolvable contextual bandit algorithms: Neural Contextual Bandits with UCB-based Exploration and Neural Thompson Sampling
A new contextual bandits training function, enabling the fastest and easiest training
A new BanditEnv class for converting any labelled dataset into a bandit learning environment
Tutorials on using AgileRL bandit algorithms with evolvable hyperparameter optimisation for SOTA results
New demo and benchmarking scripts for bandit algorithms
+ more!

More updates will be coming soon!

- Python
Published by nicku-a about 2 years ago

agilerl - v0.1.20 Probe environments and debugging tools 🗺️🧑‍🔬

AgileRL v0.1.20 focuses on making debugging of reinforcement learning implementations easier. Easily figure out what's going on with our new probe environments, that quickly isolate and validate an agent's ability to solve any kind of problem.

This release includes:

43 single- and multi-agent probe environments for image and vector observation spaces, and discrete and continuous action spaces
New functions that can automate testing with probe environments to quickly isolate your problem
A new Debugging Reinforcement Learning section of the docs, with examples and explanations
General improvements, including more stable learning for DDPG, TD3, MADDPG and MATD3 with image observations

More updates and algorithms coming soon!

- Python
Published by nicku-a about 2 years ago

agilerl - v0.1.19 Hierarchical Skills, tutorials and docs improvements 👪

AgileRL v0.1.19 introduces hierarchical curriculum learning to the platform by learning Skills. Teach agents to solve complex problems by breaking down tasks into smaller, learnable sub-tasks. We have collaborated further with the Farama Foundation to introduce more tutorials as well as improving our documentation.

This release includes the following:

New Skills wrapper is introduced to enable hierarchical curriculum learning with any algorithm. A tutorial is also provided to demonstrate how to use it.
Single-agent Gymnasium tutorials are introduced, demonstrating how to use PPO, TD3 and Rainbow DQN on a variety of environments.
Documentation site is improved, check it out: https://docs.agilerl.com
General algorithm improvements throughout the framework

Stay tuned for more updates coming soon!

- Python
Published by nicku-a about 2 years ago

agilerl - v0.1.14 Multi-agent updates, usability and tests ⚒️

AgileRL v0.1.14 introduces usability improvements to the framework with better warnings and error messages. This update also includes more robust unit tests across the library and general improvements. Multi-agent algorithms also receive updates to better handle discrete action spaces. 🤖