torchrl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

https://github.com/pytorch/rl

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, ieee.org, acs.org
  • Committers with academic emails
    5 of 173 committers (2.9%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.5%) to scientific vocabulary

Keywords

ai control decision-making distributed-computing machine-learning marl model-based-reinforcement-learning multi-agent-reinforcement-learning pytorch reinforcement-learning rl robotics torch

Keywords from Contributors

gym transformer cryptocurrency jax scheduling optimizer cryptography interpretability interactive multi-agents
Last synced: 6 months ago · JSON representation

Repository

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

Basic Info
  • Host: GitHub
  • Owner: pytorch
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage: https://pytorch.org/rl
  • Size: 209 MB
Statistics
  • Stars: 3,025
  • Watchers: 41
  • Forks: 402
  • Open Issues: 293
  • Releases: 26
Topics
ai control decision-making distributed-computing machine-learning marl model-based-reinforcement-learning multi-agent-reinforcement-learning pytorch reinforcement-learning rl robotics torch
Created about 4 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

Unit-tests Documentation Benchmarks codecov Twitter Follow Python version GitHub license pypi version pypi nightly version Downloads Downloads Discord Shield

TorchRL

Documentation | TensorDict | Features | Examples, tutorials and demos | Citation | Installation | Asking a question | Contributing

TorchRL is an open-source Reinforcement Learning (RL) library for PyTorch.

What's New

LLM API - Complete Framework for Language Model Fine-tuning

TorchRL now includes a comprehensive LLM API for post-training and fine-tuning of language models! This new framework provides everything you need for RLHF, supervised fine-tuning, and tool-augmented training:

  • Unified LLM Wrappers: Seamless integration with Hugging Face models and vLLM inference engines - more to come!
  • Conversation Management: Advanced History class for multi-turn dialogue with automatic chat template detection
  • Tool Integration: Built-in support for Python code execution, function calling, and custom tool transforms
  • Specialized Objectives: GRPO (Group Relative Policy Optimization) and SFT loss functions optimized for language models
  • High-Performance Collectors: Async data collection with distributed training support
  • Flexible Environments: Transform-based architecture for reward computation, data loading, and conversation augmentation

The LLM API follows TorchRL's modular design principles, allowing you to mix and match components for your specific use case. Check out the complete documentation and GRPO implementation example to get started!

Quick LLM API Example ```python from torchrl.envs.llm import ChatEnv from torchrl.modules.llm import TransformersWrapper from torchrl.objectives.llm import GRPOLoss from torchrl.collectors.llm import LLMCollector # Create environment with Python tool execution env = ChatEnv( tokenizer=tokenizer, system_prompt="You are an assistant that can execute Python code.", batch_size=[1] ).append_transform(PythonInterpreter()) # Wrap your language model llm = TransformersWrapper( model=model, tokenizer=tokenizer, input_mode="history" ) # Set up GRPO training loss_fn = GRPOLoss(llm, critic, gamma=0.99) collector = LLMCollector(env, llm, frames_per_batch=100) # Training loop for data in collector: loss = loss_fn(data) loss.backward() optimizer.step() ```

Key features

  • Python-first: Designed with Python as the primary language for ease of use and flexibility
  • Efficient: Optimized for performance to support demanding RL research applications
  • Modular, customizable, extensible: Highly modular architecture allows for easy swapping, transformation, or creation of new components
  • Documented: Thorough documentation ensures that users can quickly understand and utilize the library
  • Tested: Rigorously tested to ensure reliability and stability
  • Reusable functionals: Provides a set of highly reusable functions for cost functions, returns, and data processing

Design Principles

  • Aligns with PyTorch ecosystem: Follows the structure and conventions of popular PyTorch libraries (e.g., dataset pillar, transforms, models, data utilities)
  • Minimal dependencies: Only requires Python standard library, NumPy, and PyTorch; optional dependencies for common environment libraries (e.g., OpenAI Gym) and datasets (D4RL, OpenX...)

Read the full paper for a more curated description of the library.

Getting started

Check our Getting Started tutorials for quickly ramp up with the basic features of the library!

Documentation and knowledge base

The TorchRL documentation can be found here. It contains tutorials and the API reference.

TorchRL also provides a RL knowledge base to help you debug your code, or simply learn the basics of RL. Check it out here.

We have some introductory videos for you to get to know the library better, check them out:

Spotlight publications

TorchRL being domain-agnostic, you can use it across many different fields. Here are a few examples:

  • ACEGEN: Reinforcement Learning of Generative Chemical Agents for Drug Discovery
  • BenchMARL: Benchmarking Multi-Agent Reinforcement Learning
  • BricksRL: A Platform for Democratizing Robotics and Reinforcement Learning Research and Education with LEGO
  • OmniDrones: An Efficient and Flexible Platform for Reinforcement Learning in Drone Control
  • RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark
  • Robohive: A unified framework for robot learning

Writing simplified and portable RL codebase with TensorDict

RL algorithms are very heterogeneous, and it can be hard to recycle a codebase across settings (e.g. from online to offline, from state-based to pixel-based learning). TorchRL solves this problem through TensorDict, a convenient data structure(1) that can be used to streamline one's RL codebase. With this tool, one can write a complete PPO training script in less than 100 lines of code!

Code

```python import torch from tensordict.nn import TensorDictModule from tensordict.nn.distributions import NormalParamExtractor from torch import nn

from torchrl.collectors import SyncDataCollector from torchrl.data.replay_buffers import TensorDictReplayBuffer, \ LazyTensorStorage, SamplerWithoutReplacement from torchrl.envs.libs.gym import GymEnv from torchrl.modules import ProbabilisticActor, ValueOperator, TanhNormal from torchrl.objectives import ClipPPOLoss from torchrl.objectives.value import GAE

env = GymEnv("Pendulum-v1") model = TensorDictModule( nn.Sequential( nn.Linear(3, 128), nn.Tanh(), nn.Linear(128, 128), nn.Tanh(), nn.Linear(128, 128), nn.Tanh(), nn.Linear(128, 2), NormalParamExtractor() ), inkeys=["observation"], outkeys=["loc", "scale"] ) critic = ValueOperator( nn.Sequential( nn.Linear(3, 128), nn.Tanh(), nn.Linear(128, 128), nn.Tanh(), nn.Linear(128, 128), nn.Tanh(), nn.Linear(128, 1), ), inkeys=["observation"], ) actor = ProbabilisticActor( model, inkeys=["loc", "scale"], distributionclass=TanhNormal, distributionkwargs={"low": -1.0, "high": 1.0}, returnlogprob=True ) buffer = TensorDictReplayBuffer( storage=LazyTensorStorage(1000), sampler=SamplerWithoutReplacement(), batchsize=50, ) collector = SyncDataCollector( env, actor, framesperbatch=1000, totalframes=1000000, ) lossfn = ClipPPOLoss(actor, critic) advfn = GAE(valuenetwork=critic, averagegae=True, gamma=0.99, lmbda=0.95) optim = torch.optim.Adam(loss_fn.parameters(), lr=2e-4)

for data in collector: # collect data for epoch in range(10): advfn(data) # compute advantage buffer.extend(data) for sample in buffer: # consume data lossvals = lossfn(sample) lossval = sum( value for key, value in lossvals.items() if key.startswith("loss") ) lossval.backward() optim.step() optim.zero_grad() print(f"avg reward: {data['next', 'reward'].mean().item(): 4.4f}") ```

Here is an example of how the environment API relies on tensordict to carry data from one function to another during a rollout execution: Alt Text

TensorDict makes it easy to re-use pieces of code across environments, models and algorithms.

Code

For instance, here's how to code a rollout in TorchRL:

diff - obs, done = env.reset() + tensordict = env.reset() policy = SafeModule( model, in_keys=["observation_pixels", "observation_vector"], out_keys=["action"], ) out = [] for i in range(n_steps): - action, log_prob = policy(obs) - next_obs, reward, done, info = env.step(action) - out.append((obs, next_obs, action, log_prob, reward, done)) - obs = next_obs + tensordict = policy(tensordict) + tensordict = env.step(tensordict) + out.append(tensordict) + tensordict = step_mdp(tensordict) # renames next_observation_* keys to observation_* - obs, next_obs, action, log_prob, reward, done = [torch.stack(vals, 0) for vals in zip(*out)] + out = torch.stack(out, 0) # TensorDict supports multiple tensor operations

Using this, TorchRL abstracts away the input / output signatures of the modules, env, collectors, replay buffers and losses of the library, allowing all primitives to be easily recycled across settings.

Code

Here's another example of an off-policy training loop in TorchRL (assuming that a data collector, a replay buffer, a loss and an optimizer have been instantiated):

diff - for i, (obs, next_obs, action, hidden_state, reward, done) in enumerate(collector): + for i, tensordict in enumerate(collector): - replay_buffer.add((obs, next_obs, action, log_prob, reward, done)) + replay_buffer.add(tensordict) for j in range(num_optim_steps): - obs, next_obs, action, hidden_state, reward, done = replay_buffer.sample(batch_size) - loss = loss_fn(obs, next_obs, action, hidden_state, reward, done) + tensordict = replay_buffer.sample(batch_size) + loss = loss_fn(tensordict) loss.backward() optim.step() optim.zero_grad() This training loop can be re-used across algorithms as it makes a minimal number of assumptions about the structure of the data.

TensorDict supports multiple tensor operations on its device and shape (the shape of TensorDict, or its batch size, is the common arbitrary N first dimensions of all its contained tensors):

Code

python # stack and cat tensordict = torch.stack(list_of_tensordicts, 0) tensordict = torch.cat(list_of_tensordicts, 0) # reshape tensordict = tensordict.view(-1) tensordict = tensordict.permute(0, 2, 1) tensordict = tensordict.unsqueeze(-1) tensordict = tensordict.squeeze(-1) # indexing tensordict = tensordict[:2] tensordict[:, 2] = sub_tensordict # device and memory location tensordict.cuda() tensordict.to("cuda:1") tensordict.share_memory_()

TensorDict comes with a dedicated tensordict.nn module that contains everything you might need to write your model with it. And it is functorch and torch.compile compatible!

Code

diff transformer_model = nn.Transformer(nhead=16, num_encoder_layers=12) + td_module = SafeModule(transformer_model, in_keys=["src", "tgt"], out_keys=["out"]) src = torch.rand((10, 32, 512)) tgt = torch.rand((20, 32, 512)) + tensordict = TensorDict({"src": src, "tgt": tgt}, batch_size=[20, 32]) - out = transformer_model(src, tgt) + td_module(tensordict) + out = tensordict["out"]

The TensorDictSequential class allows to branch sequences of nn.Module instances in a highly modular way. For instance, here is an implementation of a transformer using the encoder and decoder blocks: python encoder_module = TransformerEncoder(...) encoder = TensorDictSequential(encoder_module, in_keys=["src", "src_mask"], out_keys=["memory"]) decoder_module = TransformerDecoder(...) decoder = TensorDictModule(decoder_module, in_keys=["tgt", "memory"], out_keys=["output"]) transformer = TensorDictSequential(encoder, decoder) assert transformer.in_keys == ["src", "src_mask", "tgt"] assert transformer.out_keys == ["memory", "output"]

TensorDictSequential allows to isolate subgraphs by querying a set of desired input / output keys: python transformer.select_subsequence(out_keys=["memory"]) # returns the encoder transformer.select_subsequence(in_keys=["tgt", "memory"]) # returns the decoder

Check TensorDict tutorials to learn more!

Features

python env_make = lambda: GymEnv("Pendulum-v1", from_pixels=True) env_parallel = ParallelEnv(4, env_make) # creates 4 envs in parallel tensordict = env_parallel.rollout(max_steps=20, policy=None) # random rollout (no policy given) assert tensordict.shape == [4, 20] # 4 envs, 20 steps rollout env_parallel.action_spec.is_in(tensordict["action"]) # spec check returns True

  • multiprocess and distributed data collectors(2) that work synchronously or asynchronously. Through the use of TensorDict, TorchRL's training loops are made very similar to regular training loops in supervised learning (although the "dataloader" -- read data collector -- is modified on-the-fly):
    Code

python env_make = lambda: GymEnv("Pendulum-v1", from_pixels=True) collector = MultiaSyncDataCollector( [env_make, env_make], policy=policy, devices=["cuda:0", "cuda:0"], total_frames=10000, frames_per_batch=50, ... ) for i, tensordict_data in enumerate(collector): loss = loss_module(tensordict_data) loss.backward() optim.step() optim.zero_grad() collector.update_policy_weights_()

Check our distributed collector examples to learn more about ultra-fast data collection with TorchRL.

  • efficient(2) and generic(1) replay buffers with modularized storage:
    Code

python storage = LazyMemmapStorage( # memory-mapped (physical) storage cfg.buffer_size, scratch_dir="/tmp/" ) buffer = TensorDictPrioritizedReplayBuffer( alpha=0.7, beta=0.5, collate_fn=lambda x: x, pin_memory=device != torch.device("cpu"), prefetch=10, # multi-threaded sampling storage=storage )

Replay buffers are also offered as wrappers around common datasets for offline RL:

Code

python from torchrl.data.replay_buffers import SamplerWithoutReplacement from torchrl.data.datasets.d4rl import D4RLExperienceReplay data = D4RLExperienceReplay( "maze2d-open-v0", split_trajs=True, batch_size=128, sampler=SamplerWithoutReplacement(drop_last=True), ) for sample in data: # or alternatively sample = data.sample() fun(sample)

  • cross-library environment transforms(1), executed on device and in a vectorized fashion(2), which process and prepare the data coming out of the environments to be used by the agent:
    Code

python env_make = lambda: GymEnv("Pendulum-v1", from_pixels=True) env_base = ParallelEnv(4, env_make, device="cuda:0") # creates 4 envs in parallel env = TransformedEnv( env_base, Compose( ToTensorImage(), ObservationNorm(loc=0.5, scale=1.0)), # executes the transforms once and on device ) tensordict = env.reset() assert tensordict.device == torch.device("cuda:0") Other transforms include: reward scaling (RewardScaling), shape operations (concatenation of tensors, unsqueezing etc.), concatenation of successive operations (CatFrames), resizing (Resize) and many more.

Unlike other libraries, the transforms are stacked as a list (and not wrapped in each other), which makes it easy to add and remove them at will: python env.insert_transform(0, NoopResetEnv()) # inserts the NoopResetEnv transform at the index 0 Nevertheless, transforms can access and execute operations on the parent environment: python transform = env.transform[1] # gathers the second transform of the list parent_env = transform.parent # returns the base environment of the second transform, i.e. the base env + the first transform

python # create an nn.Module common_module = ConvNet( bias_last_layer=True, depth=None, num_cells=[32, 64, 64], kernel_sizes=[8, 4, 3], strides=[4, 2, 1], ) # Wrap it in a SafeModule, indicating what key to read in and where to # write out the output common_module = SafeModule( common_module, in_keys=["pixels"], out_keys=["hidden"], ) # Wrap the policy module in NormalParamsWrapper, such that the output # tensor is split in loc and scale, and scale is mapped onto a positive space policy_module = SafeModule( NormalParamsWrapper( MLP(num_cells=[64, 64], out_features=32, activation=nn.ELU) ), in_keys=["hidden"], out_keys=["loc", "scale"], ) # Use a SafeProbabilisticTensorDictSequential to combine the SafeModule with a # SafeProbabilisticModule, indicating how to build the # torch.distribution.Distribution object and what to do with it policy_module = SafeProbabilisticTensorDictSequential( # stochastic policy policy_module, SafeProbabilisticModule( in_keys=["loc", "scale"], out_keys="action", distribution_class=TanhNormal, ), ) value_module = MLP( num_cells=[64, 64], out_features=1, activation=nn.ELU, ) # Wrap the policy and value funciton in a common module actor_value = ActorValueOperator(common_module, policy_module, value_module) # standalone policy from this standalone_policy = actor_value.get_policy_operator()

  • exploration wrappers and modules to easily swap between exploration and exploitation(1):
    Code

python policy_explore = EGreedyWrapper(policy) with set_exploration_type(ExplorationType.RANDOM): tensordict = policy_explore(tensordict) # will use eps-greedy with set_exploration_type(ExplorationType.DETERMINISTIC): tensordict = policy_explore(tensordict) # will not use eps-greedy

Code

### Loss modules python from torchrl.objectives import DQNLoss loss_module = DQNLoss(value_network=value_network, gamma=0.99) tensordict = replay_buffer.sample(batch_size) loss = loss_module(tensordict)

### Advantage computation python from torchrl.objectives.value.functional import vec_td_lambda_return_estimate advantage = vec_td_lambda_return_estimate(gamma, lmbda, next_state_value, reward, done, terminated)

  • a generic trainer class(1) that executes the aforementioned training loop. Through a hooking mechanism, it also supports any logging or data transformation operation at any given time.

  • various recipes to build models that correspond to the environment being deployed.

  • LLM API: Complete framework for language model fine-tuning with unified wrappers for Hugging Face and vLLM backends, conversation management with automatic chat template detection, tool integration (Python execution, function calling), specialized objectives (GRPO, SFT), and high-performance async collectors. Perfect for RLHF, supervised fine-tuning, and tool-augmented training scenarios.

    Code

```python from torchrl.envs.llm import ChatEnv from torchrl.modules.llm import TransformersWrapper from torchrl.envs.llm.transforms import PythonInterpreter

# Create environment with tool execution env = ChatEnv( tokenizer=tokenizer, systemprompt="You can execute Python code.", batchsize=[1] ).append_transform(PythonInterpreter())

# Wrap language model for training llm = TransformersWrapper( model=model, tokenizer=tokenizer, input_mode="history" )

# Multi-turn conversation with tool use obs = env.reset(TensorDict({"query": "Calculate 2+2"}, batchsize=[1])) llmoutput = llm(obs) # Generates response obs = env.step(llm_output) # Environment processes response ```

If you feel a feature is missing from the library, please submit an issue! If you would like to contribute to new features, check our call for contributions and our contribution page.

Examples, tutorials and demos

A series of State-of-the-Art implementations are provided with an illustrative purpose:

Algorithm Compile Support** Tensordict-free API Modular Losses Continuous and Discrete
DQN 1.9x + NA + (through ActionDiscretizer transform)
DDPG 1.87x + + - (continuous only)
IQL 3.22x + + +
CQL 2.68x + + +
TD3 2.27x + + - (continuous only)
TD3+BC untested + + - (continuous only)
A2C 2.67x + - +
PPO 2.42x + - +
SAC 2.62x + - +
REDQ 2.28x + - - (continuous only)
Dreamer v1 untested + + (different classes) - (continuous only)
Decision Transformers untested + NA - (continuous only)
CrossQ untested + + - (continuous only)
Gail untested + NA +
Impala untested + - +
IQL (MARL) untested + + +
DDPG (MARL) untested + + - (continuous only)
PPO (MARL) untested + - +
QMIX-VDN (MARL) untested + NA +
SAC (MARL) untested + - +
RLHF NA + NA NA
LLM API (GRPO) NA + + NA

** The number indicates expected speed-up compared to eager mode when executed on CPU. Numbers may vary depending on architecture and device.

and many more to come!

Code examples displaying toy code snippets and training scripts are also available - LLM API & GRPO - Complete language model fine-tuning pipeline - RLHF - Memory-mapped replay buffers

Check the examples directory for more details about handling the various configuration settings.

We also provide tutorials and demos that give a sense of what the library can do.

Citation

If you're using TorchRL, please refer to this BibTeX entry to cite this work: @misc{bou2023torchrl, title={TorchRL: A data-driven decision-making library for PyTorch}, author={Albert Bou and Matteo Bettini and Sebastian Dittert and Vikash Kumar and Shagun Sodhani and Xiaomeng Yang and Gianni De Fabritiis and Vincent Moens}, year={2023}, eprint={2306.00577}, archivePrefix={arXiv}, primaryClass={cs.LG} }

Installation

Create a new virtual environment:

bash python -m venv torchrl source torchrl/bin/activate # On Windows use: venv\Scripts\activate

Or create a conda environment where the packages will be installed.

conda create --name torchrl python=3.9 conda activate torchrl

Install dependencies:

PyTorch

Depending on the use of torchrl that you want to make, you may want to install the latest (nightly) PyTorch release or the latest stable version of PyTorch. See here for a detailed list of commands, including pip3 or other special installation instructions.

TorchRL offers a few pre-defined dependencies such as "torchrl[tests]", "torchrl[atari]" etc.

Torchrl

You can install the latest stable release by using bash pip3 install torchrl This should work on linux (including AArch64 machines), Windows 10 and OsX (Metal chips only). On certain Windows machines (Windows 11), one should build the library locally. This can be done in two ways:

```bash

Install and build locally v0.8.1 of the library without cloning

pip3 install git+https://github.com/pytorch/rl@v0.8.1

Clone the library and build it locally

git clone https://github.com/pytorch/tensordict git clone https://github.com/pytorch/rl pip install -e tensordict pip install -e rl ```

Note that tensordict local build requires cmake to be installed via homebrew (MacOS) or another package manager such as apt, apt-get, conda or yum but NOT pip, as well as pip install "pybind11[global]".

One can also build the wheels to distribute to co-workers using bash pip install build python -m build --wheel Your wheels will be stored there ./dist/torchrl<name>.whl and installable via bash pip install torchrl<name>.whl

The nightly build can be installed via bash pip3 install tensordict-nightly torchrl-nightly which we currently only ship for Linux machines. Importantly, the nightly builds require the nightly builds of PyTorch too. Also, a local build of torchrl with the nightly build of tensordict may fail - install both nightlies or both local builds but do not mix them.

Disclaimer: As of today, TorchRL is roughly compatible with any pytorch version >= 2.1 and installing it will not directly require a newer version of pytorch to be installed. Indirectly though, tensordict still requires the latest PyTorch to be installed and we are working hard to loosen that requirement. The C++ binaries of TorchRL (mainly for prioritized replay buffers) will only work with PyTorch 2.7.0 and above. Some features (e.g., working with nested jagged tensors) may also be limited with older versions of pytorch. It is recommended to use the latest TorchRL with the latest PyTorch version unless there is a strong reason not to do so.

Optional dependencies

The following libraries can be installed depending on the usage one wants to make of torchrl: ```

diverse

pip3 install tqdm tensorboard "hydra-core>=1.1" hydra-submitit-launcher

rendering

pip3 install "moviepy<2.0.0"

deepmind control suite

pip3 install dm_control

gym, atari games

pip3 install "gym[atari]" "gym[accept-rom-license]" pygame

tests

pip3 install pytest pyyaml pytest-instafail

tensorboard

pip3 install tensorboard

wandb

pip3 install wandb ```

Versioning issues can cause error message of the type undefined symbol and such. For these, refer to the versioning issues document for a complete explanation and proposed workarounds.

Asking a question

If you spot a bug in the library, please raise an issue in this repo.

If you have a more generic question regarding RL in PyTorch, post it on the PyTorch forum.

Contributing

Internal collaborations to torchrl are welcome! Feel free to fork, submit issues and PRs. You can checkout the detailed contribution guide here. As mentioned above, a list of open contributions can be found in here.

Contributors are recommended to install pre-commit hooks (using pre-commit install). pre-commit will check for linting related issues when the code is committed locally. You can disable th check by appending -n to your commit command: git commit -m <commit message> -n

Disclaimer

This library is released as a PyTorch beta feature. BC-breaking changes are likely to happen but they will be introduced with a deprecation warranty after a few release cycles.

License

TorchRL is licensed under the MIT License. See LICENSE for details.

Owner

  • Name: pytorch
  • Login: pytorch
  • Kind: organization
  • Location: where the eigens are valued

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 1,958
  • Total Committers: 173
  • Avg Commits per committer: 11.318
  • Development Distribution Score (DDS): 0.308
Past Year
  • Commits: 541
  • Committers: 37
  • Avg Commits per committer: 14.622
  • Development Distribution Score (DDS): 0.2
Top Committers
Name Email Commits
Vincent Moens v****s@m****m 1,354
Matteo Bettini 5****i 116
Albert Bou a****2 63
kurtamohler k****r@g****m 35
nicolas-dufour 3****r 28
Sebastian Dittert S****t@g****e 27
Omkar Salpekar o****r@f****m 24
Tom Begley t****y@g****m 17
Martin Marenz m****z@m****m 17
Faury Louis l****y@h****r 11
Waris Radji w****4@g****m 8
Skander Moalla 3****a 8
Antoine Broyelle a****e@h****i 6
Shagun Sodhani 1****i 6
Bo Liu b****s@g****m 5
Xiaomeng Yang b****m@g****m 5
Danylo Baibak b****k@m****m 5
D.L 5****s 4
Sergey Ordinskiy 1****y 4
Rob Anderson r****x@g****m 4
Honglong Tian 5****T 4
Romain Julien r****n@f****m 4
Alessandro Pietro Bardelli a****d 4
Brian Vaughan n****v 3
Almaz Zinollayev 3****e 3
Federico Berto b****2@g****m 3
Sriram Krishna s****9@g****m 3
Yohann Benchetrit y****t@g****m 3
Thomas B. Brunner t****r@g****m 3
Beh Chuen Yang d****r@g****m 3
and 143 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 433
  • Total pull requests: 1,997
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 17 days
  • Total issue authors: 184
  • Total pull request authors: 107
  • Average comments per issue: 2.59
  • Average comments per pull request: 2.16
  • Merged pull requests: 1,488
  • Bot issues: 0
  • Bot pull requests: 3
Past Year
  • Issues: 124
  • Pull requests: 935
  • Average time to close issues: 5 days
  • Average time to close pull requests: 5 days
  • Issue authors: 68
  • Pull request authors: 44
  • Average comments per issue: 1.24
  • Average comments per pull request: 1.87
  • Merged pull requests: 722
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • vmoens (44)
  • matteobettini (37)
  • skandermoalla (14)
  • albertbou92 (12)
  • smorad (11)
  • wertyuilife2 (10)
  • kurtamohler (10)
  • AlexandreBrown (9)
  • maxweissenbacher (8)
  • thomasbbrunner (8)
  • N00bcak (8)
  • valterschutz (8)
  • whatdhack (6)
  • wbinventor (5)
  • svnv-svsv-jm (5)
Pull Request Authors
  • vmoens (1,441)
  • matteobettini (103)
  • albertbou92 (84)
  • kurtamohler (68)
  • BY571 (30)
  • louisfaury (24)
  • osalpekar (17)
  • antoinebrl (11)
  • skandermoalla (8)
  • robandpdx (7)
  • Blonck (6)
  • DanilBaibak (6)
  • marcosgalleterobbva (5)
  • sriramsk1999 (5)
  • thomasbbrunner (5)
Top Labels
Issue Labels
bug (273) enhancement (127) CLA Signed (10) Good first issue (7) new algo (4) performance (3) CI (1) Suitable for minor (1) Tests (1) Data (1) documentation (1)
Pull Request Labels
CLA Signed (1,773) bug (412) enhancement (355) documentation (143) CI (129) Environments (125) Refactoring (76) Data (70) Suitable for minor (52) Tests (49) quality (45) performance (39) new algo (30) versioning (24) ciflow/docs (21) bc breaking (21) BE (19) Deprecation (16) Benchmarks (11) formatting (9) Examples (8) ciflow/binaries/all (8) Objectives (7) setup (6) llm/api (4) tutorials (4) Collectors (3) Environments/Isaac (3) Release (3) distributions (3)

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 430,827 last-month
  • Total docker downloads: 745
  • Total dependent packages: 6
    (may contain duplicates)
  • Total dependent repositories: 16
    (may contain duplicates)
  • Total versions: 993
  • Total maintainers: 1
pypi.org: torchrl
  • Versions: 44
  • Dependent Packages: 6
  • Dependent Repositories: 16
  • Downloads: 175,281 Last month
  • Docker Downloads: 697
Rankings
Stargazers count: 1.8%
Dependent packages count: 2.4%
Docker downloads count: 2.6%
Downloads: 2.7%
Average: 2.8%
Dependent repos count: 3.6%
Forks count: 3.7%
Maintainers (1)
Last synced: 6 months ago
proxy.golang.org: github.com/pytorch/rl
  • Versions: 20
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.5%
Average: 6.7%
Dependent repos count: 7.0%
Last synced: 6 months ago
pypi.org: torchrl-nightly

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning

  • Versions: 929
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 255,546 Last month
  • Docker Downloads: 48
Rankings
Stargazers count: 1.9%
Downloads: 2.4%
Forks count: 4.1%
Dependent packages count: 6.6%
Average: 9.1%
Dependent repos count: 30.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/benchmarks.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • benchmark-action/github-action-benchmark v1 composite
.github/workflows/benchmarks_pr.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • apbard/pytest-benchmark-commenter v3 composite
.github/workflows/docs.yml actions
  • JamesIves/github-pages-deploy-action releases/v4 composite
  • actions/checkout v3 composite
  • actions/upload-artifact v2 composite
.github/workflows/lint.yml actions
.github/workflows/nightly_build.yml actions
  • actions/checkout v2 composite
  • actions/download-artifact v2 composite
  • actions/setup-python v2 composite
  • actions/upload-artifact v2 composite
.github/workflows/test-linux-brax.yml actions
.github/workflows/test-linux-cpu.yml actions
.github/workflows/test-linux-d4rl.yml actions
.github/workflows/test-linux-envpool.yml actions
.github/workflows/test-linux-examples.yml actions
.github/workflows/test-linux-gpu.yml actions
.github/workflows/test-linux-gym.yml actions
.github/workflows/test-linux-habitat.yml actions
.github/workflows/test-linux-jumanji.yml actions
.github/workflows/test-linux-olddeps.yml actions
.github/workflows/test-linux-optdeps.yml actions
.github/workflows/test-linux-pettingzoo.yml actions
.github/workflows/test-linux-rlhf.yml actions
.github/workflows/test-linux-robohive.yml actions
.github/workflows/test-linux-sklearn.yml actions
.github/workflows/test-linux-smacv2.yml actions
.github/workflows/test-linux-stable-gpu.yml actions
.github/workflows/test-linux-vmas.yml actions
.github/workflows/test-macos-cpu.yml actions
.github/workflows/test-windows-optdepts-cpu.yml actions
.github/workflows/test-windows-optdepts-gpu.yml actions
.github/workflows/wheels.yml actions
  • actions/checkout v2 composite
  • actions/download-artifact v2 composite
  • actions/setup-python v2 composite
  • actions/upload-artifact v2 composite
benchmarks/requirements.txt pypi
  • pytest-benchmark *
  • tenacity *
docs/requirements.txt pypi
  • Jinja2 ==3.1.2
  • ale-py *
  • atari-py *
  • dm_control *
  • docutils *
  • gym *
  • imageio *
  • ipython *
  • matplotlib *
  • memory_profiler *
  • myst-parser *
  • numpy *
  • pygame *
  • pyrender *
  • pytest *
  • sphinx ===5.0.0
  • sphinx-autodoc-typehints *
  • sphinx-copybutton *
  • sphinx-gallery *
  • sphinx-serve ==1.0.1
  • sphinxcontrib-htmlhelp *
  • torchvision *
  • tqdm *
  • vmas ==1.2.11
pyproject.toml pypi
setup.py pypi
  • cloudpickle *
  • numpy *
  • packaging *
  • pytorch_package_dep ,
  • tensordict_dep ,
.github/workflows/build-wheels-m1.yml actions
.github/workflows/test-linux-minari.yml actions
.github/unittest/linux/scripts/environment.yml pypi
  • av *
  • cloudpickle *
  • coverage *
  • dm_control *
  • expecttest *
  • future *
  • hydra-core *
  • hypothesis *
  • imageio ==2.26.0
  • mlflow *
  • moviepy *
  • ninja *
  • pygame *
  • pytest *
  • pytest-cov *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pytest-timeout *
  • pyyaml *
  • ray <2.8.0
  • scipy *
  • tensorboard *
  • tqdm *
  • transformers *
  • wandb *
.github/unittest/linux_distributed/scripts/environment.yml pypi
  • av *
  • cloudpickle *
  • coverage *
  • dm_control *
  • expecttest *
  • future *
  • hydra-core *
  • hypothesis *
  • imageio ==2.26.0
  • mlflow *
  • moviepy *
  • pygame *
  • pytest *
  • pytest-cov *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • ray <2.8.0
  • scipy *
  • tensorboard *
  • tqdm *
  • virtualenv *
  • wandb *
.github/unittest/linux_examples/scripts/environment.yml pypi
  • av *
  • cloudpickle *
  • coverage *
  • dm_control *
  • expecttest *
  • future *
  • gym *
  • hydra-core *
  • hypothesis *
  • imageio ==2.26.0
  • mlflow *
  • moviepy *
  • pygame *
  • pytest *
  • pytest-cov *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • scipy *
  • tqdm *
  • transformers *
  • vmas *
.github/unittest/linux_libs/scripts_brax/environment.yml pypi
  • brax *
  • cloudpickle *
  • expecttest *
  • future *
  • hydra-core *
  • hypothesis *
  • pytest *
  • pytest-cov *
  • pytest-error-for-skips *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • scipy *
.github/unittest/linux_libs/scripts_d4rl/environment.yml pypi
  • cloudpickle *
  • cython <3
  • expecttest *
  • future *
  • hydra-core *
  • hypothesis *
  • pytest *
  • pytest-cov *
  • pytest-error-for-skips *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • scipy *
.github/unittest/linux_libs/scripts_envpool/environment.yml pypi
  • cloudpickle *
  • coverage *
  • dm_control *
  • expecttest *
  • future *
  • hypothesis *
  • moviepy *
  • pygame *
  • pytest-cov *
  • pytest-error-for-skips *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • scipy *
.github/unittest/linux_libs/scripts_gym/environment.yml pypi
  • cloudpickle *
  • expecttest *
  • future *
  • gym ==0.13
  • hydra-core *
  • hypothesis *
  • moviepy *
  • patchelf *
  • pygame *
  • pyopengl ==3.1.0
  • pytest *
  • pytest-cov *
  • pytest-error-for-skips *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • scipy *
  • tqdm *
.github/unittest/linux_libs/scripts_habitat/environment.yml pypi
  • cloudpickle *
  • expecttest *
  • future *
  • hydra-core *
  • hypothesis *
  • ninja *
  • pytest *
  • pytest-cov *
  • pytest-error-for-skips *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • scipy ==1.9.1
.github/unittest/linux_libs/scripts_jumanji/environment.yml pypi
  • cloudpickle *
  • expecttest *
  • future *
  • hydra-core *
  • hypothesis *
  • jumanji *
  • pytest *
  • pytest-cov *
  • pytest-error-for-skips *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • scipy *
.github/unittest/linux_libs/scripts_minari/environment.yml pypi
  • cloudpickle *
  • expecttest *
  • future *
  • hydra-core *
  • hypothesis *
  • minari *
  • pytest *
  • pytest-cov *
  • pytest-error-for-skips *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • scipy *
.github/unittest/linux_libs/scripts_pettingzoo/environment.yml pypi
  • autorom *
  • cloudpickle *
  • expecttest *
  • gym *
  • gym-notices *
  • importlib-metadata *
  • pettingzoo ==1.24.1
  • pytest *
  • pytest-cov *
  • pytest-error-for-skips *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • six *
  • zipp *
.github/unittest/linux_libs/scripts_rlhf/environment.yml pypi
  • cloudpickle *
  • datasets *
  • expecttest *
  • future *
  • hydra-core *
  • hypothesis *
  • pytest *
  • pytest-cov *
  • pytest-error-for-skips *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • scipy *
  • transformers *
.github/unittest/linux_libs/scripts_robohive/environment.yml pypi
  • cloudpickle *
  • dm_control ==1.0.11
  • expecttest *
  • future *
  • gym ==0.13
  • hydra-core *
  • hypothesis *
  • moviepy *
  • mujoco ==2.3.3
  • patchelf *
  • pygame *
  • pytest *
  • pytest-cov *
  • pytest-error-for-skips *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • scipy *
  • tqdm *
.github/unittest/linux_libs/scripts_sklearn/environment.yml pypi
  • cloudpickle *
  • expecttest *
  • future *
  • hydra-core *
  • hypothesis *
  • pandas *
  • pytest *
  • pytest-cov *
  • pytest-error-for-skips *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • scikit-learn *
  • scipy *
.github/unittest/linux_libs/scripts_smacv2/environment.yml pypi
  • cloudpickle *
  • expecttest *
  • gym *
  • gym-notices *
  • importlib-metadata *
  • numpy ==1.23.0
  • pytest *
  • pytest-cov *
  • pytest-error-for-skips *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • zipp *
.github/unittest/linux_libs/scripts_vmas/environment.yml pypi
  • cloudpickle *
  • expecttest *
  • gym *
  • gym-notices *
  • importlib-metadata *
  • numpy *
  • pyglet ==1.5.27
  • pytest *
  • pytest-cov *
  • pytest-error-for-skips *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • scipy *
  • six *
  • torch *
  • vmas *
  • zipp *
.github/unittest/linux_olddeps/scripts_gym_0_13/environment.yml pypi
  • cloudpickle *
  • expecttest *
  • future *
  • gym ==0.13
  • hydra-core *
  • hypothesis *
  • moviepy *
  • patchelf *
  • pygame *
  • pyopengl ==3.1.4
  • pytest *
  • pytest-cov *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • ray <2.8.0
  • scipy *
  • tqdm *
.github/unittest/linux_optdeps/scripts/environment.yml pypi
  • cloudpickle *
  • coverage *
  • expecttest *
  • future *
  • hypothesis *
  • pytest *
  • pytest-cov *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • ray <2.8.0
  • scipy *
.github/unittest/windows_optdepts/scripts/environment.yml pypi
  • cloudpickle *
  • coverage *
  • expecttest *
  • future *
  • hypothesis *
  • pytest *
  • pytest-cov *
  • pytest-instafail *
  • pytest-mock *
  • pytest-rerunfailures *
  • pyyaml *
  • scipy *
examples/rlhf/requirements.txt pypi
  • PyYAML *
  • datasets *
  • hydra-core *
  • matplotlib *
  • numpy *
  • requests *
  • tiktoken *
  • tqdm *
  • transformers *