https://github.com/instadeepai/jumanji
πΉοΈ A diverse suite of scalable reinforcement learning environments in JAX
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
βCITATION.cff file
-
βcodemeta.json file
Found codemeta.json file -
β.zenodo.json file
Found .zenodo.json file -
βDOI references
-
βAcademic publication links
Links to: arxiv.org -
βCommitters with academic emails
-
βInstitutional organization owner
-
βJOSS paper metadata
-
βScientific vocabulary similarity
Low similarity (7.4%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
πΉοΈ A diverse suite of scalable reinforcement learning environments in JAX
Basic Info
- Host: GitHub
- Owner: instadeepai
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://instadeepai.github.io/jumanji
- Size: 69.5 MB
Statistics
- Stars: 731
- Watchers: 13
- Forks: 92
- Open Issues: 27
- Releases: 14
Topics
Metadata Files
README.md
Environments | Installation | Quickstart | Training | Citation
| Docs
Jumanji @ ICLR 2024
Jumanji has been accepted at ICLR 2024, check out our research paper.
Welcome to the Jungle! π΄
Jumanji is a diverse suite of scalable reinforcement learning environments written in JAX. It now features 22 environments!
Jumanji is helping pioneer a new wave of hardware-accelerated research and development in the field of RL. Jumanji's high-speed environments enable faster iteration and large-scale experimentation while simultaneously reducing complexity. Originating in the research team at InstaDeep, Jumanji is now developed jointly with the open-source community. To join us in these efforts, reach out, raise issues and read our contribution guidelines or just star π to stay up to date with the latest developments!
Goals π
- Provide a simple, well-tested API for JAX-based environments.
- Make research in RL more accessible.
- Facilitate the research on RL for problems in the industry and help close the gap between research and industrial applications.
- Provide environments whose difficulty can be scaled to be arbitrarily hard.
Overview π¦
- π₯ Environment API: core abstractions for JAX-based environments.
- πΉοΈ Environment Suite: a collection of RL environments ranging from simple games to NP-hard combinatorial problems.
- π¬ Wrappers: easily connect to your favourite RL frameworks and libraries such as
Acme,
Stable Baselines3,
RLlib, Gymnasium
and DeepMind-Env through our
dm_envandgymwrappers. - π Examples: guides to facilitate Jumanji's adoption and highlight the added value of JAX-based environments.
- ποΈ Training: example agents that can be used as inspiration for the agents one may implement in their research.
Environments π
Jumanji provides a diverse range of environments ranging from simple games to NP-hard combinatorial problems.
| Environment | Category | Registered Version(s) | Source | Description |
|------------------------------------------|----------|------------------------------------------------------|--------------------------------------------------------------------------------------------------|------------------------------------------------------------------------|
| π’ Game2048 | Logic | Game2048-v1 | code | doc |
| π¨ GraphColoring | Logic | GraphColoring-v1 | code | doc |
| π£ Minesweeper | Logic | Minesweeper-v0 | code | doc |
| π² RubiksCube | Logic | RubiksCube-v0RubiksCube-partly-scrambled-v0 | code | doc |
| π SlidingTilePuzzle | Logic | SlidingTilePuzzle-v0 | code | doc |
| βοΈ Sudoku | Logic | Sudoku-v0 Sudoku-very-easy-v0| code | doc |
| π¦ BinPack (3D BinPacking Problem) | Packing | BinPack-v1 | code | doc |
| π§© FlatPack (2D Grid Filling Problem) | Packing | FlatPack-v0 | code | doc |
| π JobShop (Job Shop Scheduling Problem) | Packing | JobShop-v0 | code | doc |
| π Knapsack | Packing | Knapsack-v1 | code | doc |
| β Tetris | Packing | Tetris-v0 | code | doc |
| π§Ή Cleaner | Routing | Cleaner-v0 | code | doc |
| :link: Connector | Routing | Connector-v2 | code | doc |
| π CVRP (Capacitated Vehicle Routing Problem) | Routing | CVRP-v1 | code | doc |
| π MultiCVRP (Multi-Agent Capacitated Vehicle Routing Problem) | Routing | MultiCVRP-v0 | code | doc |
| :mag: Maze | Routing | Maze-v0 | code | doc |
| :robot: RobotWarehouse | Routing | RobotWarehouse-v0 | code | doc |
| π Snake | Routing | Snake-v1 | code | doc |
| π¬ TSP (Travelling Salesman Problem) | Routing | TSP-v1 | code | doc |
| Multi Minimum Spanning Tree Problem | Routing | MMST-v0 | code | doc |
| α§β’β’β’α£β’β’ PacMan | Routing | PacMan-v1 | code | doc
| πΎ Sokoban | Routing | Sokoban-v0 | code | doc |
| π Level-Based Foraging | Routing | LevelBasedForaging-v0 | code | doc |
| π Search and Rescue | Swarms | SearchAndRescue-v0 | code | doc |
Installation π¬
You can install the latest release of Jumanji from PyPI:
bash
pip install -U jumanji
Alternatively, you can install the latest development version directly from GitHub:
bash
pip install git+https://github.com/instadeepai/jumanji.git
Jumanji has been tested on Python 3.10, 3.11 and 3.12. Note that because the installation of JAX differs depending on your hardware accelerator, we advise users to explicitly install the correct JAX version (see the official installation guide).
Rendering: Matplotlib is used for rendering all the environments. To visualize the environments
you will need a GUI backend. For example, on Linux, you can install Tk via:
apt-get install python3-tk, or using conda: conda install tk. Check out
Matplotlib backends for a list of
backends you can use.
Quickstart β‘
RL practitioners will find Jumanji's interface familiar as it combines the widely adopted
OpenAI Gym and
DeepMind Environment interfaces. From OpenAI Gym, we adopted
the idea of a registry and the render method, while our TimeStep structure is inspired by
DeepMind Environment.
Basic Usage π§βπ»
```python import jax import jumanji
Instantiate a Jumanji environment using the registry
env = jumanji.make('Snake-v1')
Reset your (jit-able) environment
key = jax.random.PRNGKey(0) state, timestep = jax.jit(env.reset)(key)
(Optional) Render the env state
env.render(state)
Interact with the (jit-able) environment
action = env.actionspec.generatevalue() # Action selection (dummy value here) state, timestep = jax.jit(env.step)(state, action) # Take a step and observe the next state and time step ```
staterepresents the internal state of the environment: it contains all the information required to take a step when executing an action. This should not be confused with theobservationcontained in thetimestep, which is the information perceived by the agent.timestepis a dataclass containingstep_type,reward,discount,observationandextras. This structure is similar todm_env.TimeStepexcept for theextrasfield that was added to allow users to log environments metrics that are neither part of the agent's observation nor part of the environment's internal state.
Advanced Usage π§βπ¬
Being written in JAX, Jumanji's environments benefit from many of its features including
automatic vectorization/parallelization (jax.vmap, jax.pmap) and JIT-compilation (jax.jit),
which can be composed arbitrarily.
We provide an example of a more advanced usage in the
advanced usage guide.
Registry and Versioning π
Like OpenAI Gym, Jumanji keeps a strict versioning of its environments for reproducibility reasons.
We maintain a registry of standard environments with their configuration.
For each environment, a version suffix is appended, e.g. Snake-v1.
When changes are made to environments that might impact learning results,
the version number is incremented by one to prevent potential confusion.
For a full list of registered versions of each environment, check out
the documentation.
Training ποΈ
To showcase how to train RL agents on Jumanji environments, we provide a random agent and a vanilla actor-critic (A2C) agent. These agents can be found in jumanji/training/.
Because the environment framework in Jumanji is so flexible, it allows pretty much any problem to be implemented as a Jumanji environment, giving rise to very diverse observations. For this reason, environment-specific networks are required to capture the symmetries of each environment. Alongside the A2C agent implementation, we provide examples of such environment-specific actor-critic networks in jumanji/training/networks.
β οΈ The example agents in
jumanji/trainingare only meant to serve as inspiration for how one can implement an agent. Jumanji is first and foremost a library of environments - as such, the agents and networks will not be maintained to a production standard.
For more information on how to use the example agents, see the training guide.
Contributing π€
Contributions are welcome! See our issue tracker for good first issues. Please read our contributing guidelines for details on how to submit pull requests, our Contributor License Agreement, and community guidelines.
Citing Jumanji βοΈ
If you use Jumanji in your work, please cite the library using:
@misc{bonnet2024jumanji,
title={Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX},
author={ClΓ©ment Bonnet and Daniel Luo and Donal Byrne and Shikha Surana and Sasha Abramowitz and Paul Duckworth and Vincent Coyette and Laurence I. Midgley and Elshadai Tegegn and Tristan Kalloniatis and Omayma Mahjoub and Matthew Macfarlane and Andries P. Smit and Nathan Grinsztajn and Raphael Boige and Cemlyn N. Waters and Mohamed A. Mimouni and Ulrich A. Mbou Sob and Ruan de Kock and Siddarth Singh and Daniel Furelos-Blanco and Victor Le and Arnu Pretorius and Alexandre Laterre},
year={2024},
eprint={2306.09884},
url={https://arxiv.org/abs/2306.09884},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
See Also π
Other works have embraced the approach of writing RL environments in JAX. In particular, we suggest users check out the following sister repositories:
- π€ Qdax is a library to accelerate Quality-Diversity and neuro-evolution algorithms through hardware accelerators and parallelization.
- π³ Evojax provides tools to enable neuroevolution algorithms to work with neural networks running across multiple TPU/GPUs.
- π¦Ύ Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators.
- ποΈβ Gymnax implements classic environments including classic control, bsuite, MinAtar and a collection of meta RL tasks.
- π² Pgx provides classic board game environments like Backgammon, Shogi, and Go.
Acknowledgements π
The development of this library was supported with Cloud TPUs from Google's TPU Research Cloud (TRC) π€.
Owner
- Name: InstaDeep Ltd
- Login: instadeepai
- Kind: organization
- Email: hello@instadeep.com
- Location: London, UK
- Website: https://instadeep.com
- Twitter: instadeepai
- Repositories: 14
- Profile: https://github.com/instadeepai
We productise innovation
GitHub Events
Total
- Create event: 13
- Release event: 1
- Issues event: 37
- Watch event: 146
- Delete event: 10
- Issue comment event: 172
- Push event: 42
- Pull request event: 42
- Pull request review comment event: 144
- Pull request review event: 141
- Fork event: 22
Last Year
- Create event: 13
- Release event: 1
- Issues event: 37
- Watch event: 146
- Delete event: 10
- Issue comment event: 172
- Push event: 42
- Pull request event: 42
- Pull request review comment event: 144
- Pull request review event: 141
- Fork event: 22
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| ClΓ©ment Bonnet | 5****t | 47 |
| Sasha Abramowitz | r****a@g****m | 16 |
| Daniel | 5****6 | 13 |
| Alex Laterre | a****e | 8 |
| surana01 | 1****1 | 4 |
| aar65537 | 1****7 | 4 |
| George Ogden | 3****n | 3 |
| Vincent Coyette | 9****v | 3 |
| Wiem Khlifi | w****i@i****m | 3 |
| zombie-einstein | 1****n | 2 |
| Tristan Kalloniatis | t****s@g****m | 2 |
| RaphaΓ«l Boige | 4****b | 2 |
| Elshadai Tegegn | 5****K | 2 |
| Callum Tilbury | 3****y | 2 |
| Arnu Pretorius | a****s@g****m | 2 |
| Cemlyn | 4****7 | 1 |
| Cyprien | c****c@g****m | 1 |
| Daniel Palenicek | d****n | 1 |
| David Tao | r****o@g****m | 1 |
| siddarthsingh1 | 9****1 | 1 |
| mvmacfarlane | 7****e | 1 |
| helpingstar | i****r@g****m | 1 |
| dantp | 1****i | 1 |
| Ulrich A. Mbou Sob | m****l@g****m | 1 |
| Thomas Hirtz | h****t@g****m | 1 |
| S Ashwin | a****1@g****m | 1 |
| RuanJohn | 3****n | 1 |
| Rodrigue Siry | r****y@g****m | 1 |
| Raphael Avalos | r****l@a****r | 1 |
| RJ Wang | 1****j | 1 |
| and 7 more... | ||
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 104
- Total pull requests: 147
- Average time to close issues: 4 months
- Average time to close pull requests: about 1 month
- Total issue authors: 39
- Total pull request authors: 39
- Average comments per issue: 1.2
- Average comments per pull request: 2.2
- Merged pull requests: 118
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 16
- Pull requests: 25
- Average time to close issues: about 1 month
- Average time to close pull requests: 19 days
- Issue authors: 9
- Pull request authors: 6
- Average comments per issue: 2.75
- Average comments per pull request: 7.84
- Merged pull requests: 19
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- dluo96 (26)
- clement-bonnet (17)
- zombie-einstein (5)
- sash-a (4)
- George-Ogden (3)
- coyettev (3)
- cemlyn007 (3)
- thomashirtz (2)
- carlosgmartin (2)
- Wendyuf (2)
- RuanJohn (2)
- aar65537 (2)
- arnupretorius (2)
- Egiob (1)
- sotetsuk (1)
Pull Request Authors
- clement-bonnet (52)
- sash-a (29)
- zombie-einstein (12)
- dluo96 (12)
- aar65537 (6)
- George-Ogden (6)
- WiemKhlifi (6)
- callumtilbury (6)
- coyettev (5)
- RuanJohn (3)
- chouakifares (2)
- raphaelavalos (2)
- thomashirtz (2)
- taodav (2)
- surana01 (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 18
proxy.golang.org: github.com/instadeepai/jumanji
- Documentation: https://pkg.go.dev/github.com/instadeepai/jumanji#section-documentation
- License: apache-2.0
-
Latest release: v1.1.1
published about 1 year ago
Rankings
Dependencies
- black ==22.3.0 development
- coverage * development
- dm-haiku ==0.0.5 development
- flake8 ==4.0.1 development
- hydra-core * development
- isort ==5.10.1 development
- livereload * development
- mkdocs ==1.2.3 development
- mkdocs-git-revision-date-plugin * development
- mkdocs-include-markdown-plugin * development
- mkdocs-material * development
- mkdocs-mermaid2-plugin ==0.6.0 development
- mkdocstrings ==0.18.0 development
- mknotebooks ==0.7.1 development
- mypy ==0.942 development
- nbmake * development
- optax >=0.0.9 development
- pre-commit ==2.17.0 development
- promise * development
- pymdown-extensions * development
- pytest ==7.0.1 development
- pytest-cov * development
- pytest-mock * development
- pytest-parallel * development
- pytest-xdist * development
- pytype * development
- testfixtures * development
- Pillow >=9.0.0
- brax >=0.0.10
- chex >=0.1.3
- dm-env >=1.5
- gym >=0.19.0
- jax >=0.2.26
- jaxlib >=0.1.74
- matplotlib >=3.3.4
- numpy >=1.19.5
- pygame >=2.0.0
- typing-extensions >=4.0.0
- JamesIves/github-pages-deploy-action v4 composite
- actions/checkout v3 composite
- actions/checkout v2 composite
- actions/setup-python v1 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- dm-haiku ==0.0.9
- huggingface-hub *
- hydra-core ==1.3
- neptune-client ==0.16.15
- optax >=0.1.4
- rlax >=0.1.4
- tensorboardX ==2.5.1
- tqdm >=4.64.1