agilerl

Streamlining reinforcement learning with RLOps. State-of-the-art RL algorithms and tools, with 10x faster training through evolutionary hyperparameter optimization.

https://github.com/agilerl/agilerl

Keywords

agilerl automl deep-learning deep-reinforcement-learning distributed evolutionary-algorithms gym hpo hyperparameter-optimization hyperparameter-tuning machine-learning mlops multi-agent multi-agent-reinforcement-learning pettingzoo python pytorch reinforcement-learning rlops training

Keywords from Contributors

energy-system-model parallel mesh spacy-extension hydrology cython medical-imaging transformers regionalization energy-system

Last synced: 6 months ago · JSON representation ·

Repository

Streamlining reinforcement learning with RLOps. State-of-the-art RL algorithms and tools, with 10x faster training through evolutionary hyperparameter optimization.

Basic Info

Host: GitHub
Owner: AgileRL
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://agilerl.com
Size: 61.3 MB

Statistics

Stars: 813
Watchers: 8
Forks: 66
Open Issues: 13
Releases: 25

Topics

agilerl automl deep-learning deep-reinforcement-learning distributed evolutionary-algorithms gym hpo hyperparameter-optimization hyperparameter-tuning machine-learning mlops multi-agent multi-agent-reinforcement-learning pettingzoo python pytorch reinforcement-learning rlops training

Created almost 3 years ago · Last pushed 6 months ago

Metadata Files

Readme Contributing License Code of conduct Citation

README.md

AgileRL

Reinforcement learning streamlined.
Easier and faster reinforcement learning with RLOps. Visit our website. View documentation.
Join the Discord Server for questions, help and collaboration.

[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Documentation Status](https://readthedocs.org/projects/agilerl/badge/?version=latest)](https://docs.agilerl.com/en/latest/?badge=latest) [![Downloads](https://static.pepy.tech/badge/agilerl)](https://pypi.python.org/pypi/agilerl/) [![Discord](https://dcbadge.limes.pink/api/server/https://discord.gg/eB8HyTA2ux?style=flat)](https://discord.gg/eB8HyTA2ux) [![Arena](./.github/badges/arena-github-badge.svg)](https://arena.agilerl.com)

🚀 Train super-fast for free on Arena, the RLOps platform from AgileRL 🚀

AgileRL is a Deep Reinforcement Learning library focused on improving development by introducing RLOps - MLOps for reinforcement learning.

This library is initially focused on reducing the time taken for training models and hyperparameter optimization (HPO) by pioneering evolutionary HPO techniques for reinforcement learning.
Evolutionary HPO has been shown to drastically reduce overall training times by automatically converging on optimal hyperparameters, without requiring numerous training runs.
We are constantly adding more algorithms and features. AgileRL already includes state-of-the-art evolvable on-policy, off-policy, offline, multi-agent and contextual multi-armed bandit reinforcement learning algorithms with distributed training.

AgileRL offers 10x faster hyperparameter optimization than SOTA.

Get Started

To see the full AgileRL documentation, including tutorials, visit our documentation site. To ask questions and get help, collaborate, or discuss anything related to reinforcement learning, join the AgileRL Discord Server.

Install as a package with pip: bash pip install agilerl Or install in development mode: bash git clone https://github.com/AgileRL/AgileRL.git && cd AgileRL pip install -e .

To install the nightly version of AgileRL with the latest features, use:

bash pip install git+https://github.com/AgileRL/AgileRL.git@nightly

Benchmarks

Reinforcement learning algorithms and libraries are usually benchmarked once the optimal hyperparameters for training are known, but it often takes hundreds or thousands of experiments to discover these. This is unrealistic and does not reflect the true, total time taken for training. What if we could remove the need to conduct all these prior experiments?

In the charts below, a single AgileRL run, which automatically tunes hyperparameters, is benchmarked against Optuna's multiple training runs traditionally required for hyperparameter optimization, demonstrating the real time savings possible. Global steps is the sum of every step taken by any agent in the environment, including across an entire population.

AgileRL offers an order of magnitude speed up in hyperparameter optimization vs popular reinforcement learning training frameworks combined with Optuna. Remove the need for multiple training runs and save yourself hours.

AgileRL also supports multi-agent reinforcement learning using the Petting Zoo-style (parallel API). The charts below highlight the performance of our MADDPG and MATD3 algorithms with evolutionary hyper-parameter optimisation (HPO), benchmarked against epymarl's MADDPG algorithm with grid-search HPO for the simple speaker listener and simple spread environments.

Tutorials

We are constantly updating our tutorials to showcase the latest features of AgileRL and how users can leverage our evolutionary HPO to achieve 10x faster hyperparameter optimization. Please see the available tutorials below.

| Tutorial Type | Description | Tutorials | |---------------|-------------|-----------| | Single-agent tasks | Guides for training both on and off-policy agents to beat a variety of Gymnasium environments. | PPO - Acrobot
TD3 - Lunar Lander
Rainbow DQN - CartPole
Recurrent PPO - Masked Pendulum | | Multi-agent tasks | Use of PettingZoo environments such as training DQN to play Connect Four with curriculum learning and self-play, and for multi-agent tasks in MPE environments. | DQN - Connect Four
MADDPG - Space Invaders
MATD3 - Speaker Listener | | Hierarchical curriculum learning | Shows how to teach agents Skills and combine them to achieve an end goal. | PPO - Lunar Lander | | Contextual multi-arm bandits | Learn to make the correct decision in environments that only have one timestep. | NeuralUCB - Iris Dataset
NeuralTS - PenDigits | | Custom Modules & Networks | Learn how to create custom evolvable modules and networks for RL algorithms. | Dueling Distributional Q Network
EvolvableSimBa | | LLM Finetuning | Learn how to finetune an LLM using AgileRL. | GRPO |

Evolvable algorithms (more coming soon!)

### Single-agent algorithms

| RL | Algorithm | | ---------- | --------- | | On-Policy | Proximal Policy Optimization (PPO) | | Off-Policy | Deep Q Learning (DQN)
Rainbow DQN
Deep Deterministic Policy Gradient (DDPG)
Twin Delayed Deep Deterministic Policy Gradient (TD3) | | Offline | Conservative Q-Learning (CQL)
Implicit Language Q-Learning (ILQL) |

### Multi-agent algorithms

| RL | Algorithm | | ---------- | --------- | | Multi-agent | Multi-Agent Deep Deterministic Policy Gradient (MADDPG)
Multi-Agent Twin-Delayed Deep Deterministic Policy Gradient (MATD3)
Independent Proximal Policy Optimization (IPPO)|

### Contextual multi-armed bandit algorithms

| RL | Algorithm | | ---------- | --------- | | Bandits | Neural Contextual Bandits with UCB-based Exploration (NeuralUCB)
Neural Contextual Bandits with Thompson Sampling (NeuralTS) |

### LLM Reasoning Algorithms

| RL | Algorithm | | ---------- | --------- | | On-Policy | Group Relative Policy Optimization (GRPO)

Train an Agent to Beat a Gym Environment

Before starting training, there are some meta-hyperparameters and settings that must be set. These are defined in INITHP, for general parameters, and MUTATIONPARAMS, which define the evolutionary probabilities, and NET_CONFIG, which defines the network architecture. For example:

Basic Hyperparameters

```python INIT_HP = { 'ENV_NAME': 'LunarLander-v3', # Gym environment name 'ALGO': 'DQN', # Algorithm 'DOUBLE': True, # Use double Q-learning 'CHANNELS_LAST': False, # Swap image channels dimension from last to first [H, W, C] -> [C, H, W] 'BATCH_SIZE': 256, # Batch size 'LR': 1e-3, # Learning rate 'MAX_STEPS': 1_000_000, # Max no. steps 'TARGET_SCORE': 200., # Early training stop at avg score of last 100 episodes 'GAMMA': 0.99, # Discount factor 'MEMORY_SIZE': 10000, # Max memory buffer size 'LEARN_STEP': 1, # Learning frequency 'TAU': 1e-3, # For soft update of target parameters 'TOURN_SIZE': 2, # Tournament size 'ELITISM': True, # Elitism in tournament selection 'POP_SIZE': 6, # Population size 'EVO_STEPS': 10_000, # Evolution frequency 'EVAL_STEPS': None, # Evaluation steps 'EVAL_LOOP': 1, # Evaluation episodes 'LEARNING_DELAY': 1000, # Steps before starting learning 'WANDB': True, # Log with Weights and Biases } ```

Mutation Hyperparameters

```python MUTATION_PARAMS = { # Relative probabilities 'NO_MUT': 0.4, # No mutation 'ARCH_MUT': 0.2, # Architecture mutation 'NEW_LAYER': 0.2, # New layer mutation 'PARAMS_MUT': 0.2, # Network parameters mutation 'ACT_MUT': 0, # Activation layer mutation 'RL_HP_MUT': 0.2, # Learning HP mutation 'MUT_SD': 0.1, # Mutation strength 'RAND_SEED': 1, # Random seed } ```

Basic Network Configuration

```python NET_CONFIG = { 'latent_dim': 16 'encoder_config': { 'hidden_size': [32] # Observation encoder configuration } 'head_config': { 'hidden_size': [32] # Network head configuration } } ```

Creating a Population of Agents

First, use utils.utils.create_population to create a list of agents - our population that will evolve and mutate to the optimal hyperparameters.

Population Creation Example

```python import torch from agilerl.utils.utils import ( make_vect_envs, create_population, observation_space_channels_to_first ) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") num_envs = 16 env = make_vect_envs(env_name=INIT_HP['ENV_NAME'], num_envs=num_envs) observation_space = env.single_observation_space action_space = env.single_action_space if INIT_HP['CHANNELS_LAST']: observation_space = observation_space_channels_to_first(observation_space) agent_pop = create_population( algo=INIT_HP['ALGO'], # Algorithm observation_space=observation_space, # Observation space action_space=action_space, # Action space net_config=NET_CONFIG, # Network configuration INIT_HP=INIT_HP, # Initial hyperparameters population_size=INIT_HP['POP_SIZE'], # Population size num_envs=num_envs, # Number of vectorized environments device=device ) ```

Initializing Evolutionary HPO

Next, create the tournament, mutations and experience replay buffer objects that allow agents to share memory and efficiently perform evolutionary HPO.

Mutations and Tournament Seelection Example

```python from agilerl.components.replay_buffer import ReplayBuffer from agilerl.hpo.tournament import TournamentSelection from agilerl.hpo.mutation import Mutations memory = ReplayBuffer( max_size=INIT_HP['MEMORY_SIZE'], # Max replay buffer size device=device, ) tournament = TournamentSelection( tournament_size=INIT_HP['TOURN_SIZE'], # Tournament selection size elitism=INIT_HP['ELITISM'], # Elitism in tournament selection population_size=INIT_HP['POP_SIZE'], # Population size eval_loop=INIT_HP['EVAL_LOOP'], # Evaluate using last N fitness scores ) mutations = Mutations( no_mutation=MUTATION_PARAMS['NO_MUT'], # No mutation architecture=MUTATION_PARAMS['ARCH_MUT'], # Architecture mutation new_layer_prob=MUTATION_PARAMS['NEW_LAYER'], # New layer mutation parameters=MUTATION_PARAMS['PARAMS_MUT'], # Network parameters mutation activation=MUTATION_PARAMS['ACT_MUT'], # Activation layer mutation rl_hp=MUTATION_PARAMS['RL_HP_MUT'], # Learning HP mutation mutation_sd=MUTATION_PARAMS['MUT_SD'], # Mutation strength rand_seed=MUTATION_PARAMS['RAND_SEED'], # Random seed device=device, ) ```

Train A Population of Agents

The easiest training loop implementation is to use our trainoffpolicy() function. It requires the agent have methods get_action() and learn().

```python from agilerl.training.trainoffpolicy import trainoffpolicy

trainedpop, popfitnesses = trainoffpolicy( env=env, # Gym-style environment envname=INITHP['ENVNAME'], # Environment name algo=INITHP['ALGO'], # Algorithm pop=agentpop, # Population of agents memory=memory, # Replay buffer swapchannels=INITHP['CHANNELSLAST'], # Swap image channel from last to first maxsteps=INITHP["MAXSTEPS"], # Max number of training steps evosteps=INITHP['EVOSTEPS'], # Evolution frequency evalsteps=INITHP["EVALSTEPS"], # Number of steps in evaluation episode evalloop=INITHP["EVALLOOP"], # Number of evaluation episodes learningdelay=INITHP['LEARNINGDELAY'], # Steps before starting learning target=INITHP['TARGETSCORE'], # Target score for early stopping tournament=tournament, # Tournament selection object mutation=mutations, # Mutations object wb=INITHP['WANDB'], # Weights and Biases tracking )

```

Citing AgileRL

If you use AgileRL in your work, please cite the repository: bibtex @software{Ustaran-Anderegg_AgileRL, author = {Ustaran-Anderegg, Nicholas and Pratt, Michael and Sabal-Bermudez, Jaime}, license = {Apache-2.0}, title = {{AgileRL}}, url = {https://github.com/AgileRL/AgileRL} }

Owner

Name: AgileRL
Login: AgileRL
Kind: organization
Location: United Kingdom

Website: https://agilerl.com
Repositories: 1
Profile: https://github.com/AgileRL

RLOps

Citation (CITATION.cff)

cff-version: 1.2.0
title: AgileRL
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Nicholas
    family-names: Ustaran-Anderegg
    affiliation: AgileRL
  - given-names: Michael
    family-names: Pratt
    affiliation: AgileRL
repository-code: 'https://github.com/AgileRL/AgileRL'
url: 'https://agilerl.com'
repository: 'https://docs.agilerl.com'
abstract: >-
  AgileRL is a Deep Reinforcement Learning library focused
  on streamlining development by introducing RLOps.
license: Apache-2.0

Committers

Last synced: 9 months ago

All Time

Total Commits: 1,517
Total Committers: 25
Avg Commits per committer: 60.68
Development Distribution Score (DDS): 0.38

Past Year

Commits: 388
Committers: 17
Avg Commits per committer: 22.824
Development Distribution Score (DDS): 0.673

Top Committers

Name	Email	Commits
nicku-a	n**a@b**m	941
mikepratt1	m**0@g**m	264
Jaime Sabal	j**b@g**m	119
pre-commit-ci[bot]	6****]	93
gonultasbu	g****u	33
mp4217	6****7	15
Shreyans Jain	s**n@S**l	9
dependabot[bot]	4****]	9
root	r**t@A**A	8
Sami Mourad	s**i@a**m	4
nargiz	n**z@s**s	4
SeanDaSheep	6****p	2
John Balis	p**s@g**m	2
Wah Loon Keng	k**l@g**m	2
Omar Younis	o**8@g**m	2
Daniel Sont	d**t@g**m	1
Erick Fonseca	e**a@g**m	1
Ewout ter Hoeven	E**n@s**l	1
Jonathan Dumaine	j**e@d**m	1
Quentin Dreyer	q**r@g**m	1
EC2 Default User	e**r@i**l	1
EC2 Default User	e**r@i**l	1
EC2 Default User	e**r@i**l	1
cvt1006	c**6@g**m	1
yakir4123	3****3	1

Committer Domains (Top 20 + Academic)

ip-172-31-22-184.ec2.internal: 1 ip-172-31-29-228.ec2.internal: 1 ip-172-31-64-184.ec2.internal: 1 dumstruck.com: 1 student.tudelft.nl: 1 sentience.rocks: 1 agilerl.com: 1 btinternet.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 58
Total pull requests: 313
Average time to close issues: 18 days
Average time to close pull requests: 2 days
Total issue authors: 34
Total pull request authors: 22
Average comments per issue: 1.19
Average comments per pull request: 0.47
Merged pull requests: 250
Bot issues: 0
Bot pull requests: 46

Past Year

Issues: 39
Pull requests: 154
Average time to close issues: 12 days
Average time to close pull requests: 4 days
Issue authors: 19
Pull request authors: 14
Average comments per issue: 1.0
Average comments per pull request: 0.49
Merged pull requests: 110
Bot issues: 0
Bot pull requests: 24

View more stats

Top Authors

Issue Authors

jaimesabalbermudez (14)
Vincent-zcm (5)
gonultasbu (3)
sryu1 (2)
DKarz (2)
JonDum (2)
kuza55 (2)
ItsaBox9368 (2)
ds0nt (1)
alassedy (1)
FarStryke21 (1)
AIsCocover (1)
imkow (1)
natebade (1)
AlexAdrian-Hamazaki (1)

Pull Request Authors

nicku-a (115)
mikepratt1 (73)
jaimesabalbermudez (52)
pre-commit-ci[bot] (42)
dependabot[bot] (9)
gonultasbu (5)
brieyla1 (4)
kengz (4)
JonDum (4)
yakir4123 (2)
qkdreyer (2)
OnlyTsukii (2)
ds0nt (2)
aseembits93 (2)
cvt1006 (2)

Top Labels

Issue Labels

bug (25) enhancement (10) refactor (2) dependencies (1)

Pull Request Labels

dependencies (9) python (2)

Packages

Total packages: 2
Total downloads: unknown

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 0
(may contain duplicates)
Total versions: 50

proxy.golang.org: github.com/AgileRL/AgileRL

Documentation: https://pkg.go.dev/github.com/AgileRL/AgileRL#section-documentation
License: apache-2.0
Latest release: v2.3.3+incompatible
published 7 months ago

Versions: 25
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.5%

Average: 5.6%

Dependent repos count: 5.8%

Last synced: 6 months ago

proxy.golang.org: github.com/agilerl/agilerl

Documentation: https://pkg.go.dev/github.com/agilerl/agilerl#section-documentation
License: apache-2.0
Latest release: v2.3.3+incompatible
published 7 months ago

Versions: 25
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.5%

Average: 5.6%

Dependent repos count: 5.8%

Last synced: 6 months ago

agilerl

Science Score: 54.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

AgileRL

🚀 Train super-fast for free on Arena, the RLOps platform from AgileRL 🚀

Table of Contents

Get Started

Benchmarks

Tutorials

Evolvable algorithms (more coming soon!)

Train an Agent to Beat a Gym Environment

Creating a Population of Agents

Initializing Evolutionary HPO

Train A Population of Agents

Citing AgileRL

Owner

Citation (CITATION.cff)

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

proxy.golang.org: github.com/AgileRL/AgileRL

Rankings

proxy.golang.org: github.com/agilerl/agilerl

Rankings