MDPax: GPU-accelerated MDP solvers in Python with JAX

MDPax: GPU-accelerated MDP solvers in Python with JAX - Published in JOSS (2025)

https://github.com/joefarrington/mdpax

Keywords

dynamic-programming jax optimization reinforcement-learning value-iteration

Last synced: 4 months ago · JSON representation

Repository

GPU-accelerated MDP solvers in Python with JAX

Basic Info

Host: GitHub
Owner: joefarrington
License: mit
Language: Python
Default Branch: main
Homepage: https://mdpax.readthedocs.io
Size: 4.71 MB

Statistics

Stars: 22
Watchers: 2
Forks: 0
Open Issues: 1
Releases: 0

Topics

dynamic-programming jax optimization reinforcement-learning value-iteration

Created about 1 year ago · Last pushed 4 months ago

Metadata Files

Readme License

MDPax

MDPax is designed for researchers and practitioners who want to solve large Markov Decision Process (MDP) problems but don't want to become experts in graphics processing unit (GPU) programming. By using JAX, we can take advantage of the massive parallel processing power of GPUs while describing new problems using a simple Python interface.

You can run MDPax on your local GPU, or try it for free using Google Colab, which provides access to GPUs in the cloud with no setup required.

Key capabilities

Solve MDPs with millions of states using value iteration or policy iteration
Automatic support for one or more identical GPUs
Flexible interface for defining your own MDP problem or solver algorithm
Asynchronous checkpointing using Orbax
Ready-to-use examples including perishable inventory problems from recent literature

Overview

MDPax is a Python package for solving large-scale MDPs, leveraging JAX's support for vectorization, parallelization, and just-in-time (JIT) compilation on GPUs.

The package is adapted from the research code developed for Farrington et al (2025) (a preprint was released in 2023). We demonstrated that this approach is particularly well-suited for perishable inventory management problems where the state space grows exponentially with the number of products and the maximum useful life of the products. By implementing the problems in JAX and using consumer-grade GPUs (or freely available GPUs on services such as Google Colab) it is possible to compute the exact solution for realistically sized perishable inventory problems where this was recently reported to be infeasible or impractical.

Traditional value iteration implementations face two main challenges with large state spaces:

Memory requirements - the full transition matrix grows with the square of the state space size
Computational complexity - nested loops over states, actions, and possible next states become prohibitively expensive

MDPax addresses these challenges by:

Using a functional approach where users specify a deterministic transition function in terms of state, action, and random event, rather than providing the full transition matrix
Leveraging JAX's transformations to optimize computation:
- vmap to vectorize operations across states and actions
- pmap to parallelize across multiple GPU devices where available
- jit to compile operations once and reuse them efficiently across many value iteration steps

While MDPax can run on CPU or GPU hardware, it is specifically designed for large problems (millions of states) on GPU. For small to medium-sized problems, especially when running on CPU, existing packages like pymdptoolbox may be more efficient due to JAX's JIT compilation overhead and GPU memory transfer costs. These overheads become negligible for larger problems where the benefits of parallelization and vectorization dominate.

Installation

MDPax can be installed from PyPI using pip:

bash pip install mdpax

The main dependencies are:

jax
chex
numpyro
orbax
loguru
hydra-core
jaxtyping
numpy

See pyproject.toml for the complete list of dependencies and version requirements.

GPU (recommended)

MDPax is designed for GPU-accelerated computation and works best on Linux systems with NVIDIA GPUs.

For GPU support, ensure your NVIDIA drivers and CUDA toolkit are compatible with JAX. See the JAX installation guide for details.

CPU only

MDPax will automatically fall back to CPU on Linux if no GPU is detected, though performance will be significantly slower for large problems. If CUDA libraries are installed but no GPU hardware is available, you may need to force CPU execution by setting:

bash export JAX_PLATFORMS=cpu

Windows/macOS: JAX does not currently support GPUs on Windows and only has experimental support for Apple GPUs on macOS. MDPax therefore uses CPU-only versions of JAX on these platforms, giving reduced performance.

Examples

If you want to run the example notebooks, install the additional dependencies with:

bash pip install "mdpax[examples]"

Google Colab

You can try MDPax without any local installation using Google Colab, which provides free GPU access. See our Getting Started notebook for an interactive introduction.

To verify you're using a GPU in Colab, click Runtime > Change runtime type and ensure "GPU" is selected as the Hardware accelerator. You can confirm GPU availability by running !nvidia-smi in a code cell.

Quick Start

The following example shows how to solve a simple forest management problem (adapted from pymdptoolbox's example):

```python from mdpax.problems import Forest from mdpax.solvers import ValueIteration

Create forest management problem

problem = Forest()

Create solver with discount factor gamma = 0.9,

and convergence tolerance epsilon = 0.01

solver = ValueIteration(problem, gamma=0.9, epsilon=0.01)

Solve the problem (automatically uses GPU if available)

solution = solver.solve(max_iterations=500)

Access the optimal policy and value function

print(solution.policy) # array([[0], [0], [0]]) - "wait" for all states print(solution.values) # value for each state under optimal policy ```

This example demonstrates the core workflow:

Create a problem instance
Initialize a solver
Solve to get the optimal policy and value function

Documentation

Full documentation is available at https://mdpax.readthedocs.io/.

Tutorials

Getting Started - Basic usage on several problems
Creating custom MDP problems - Guide to implementing your own problems, using FrozenLake as an example

API Reference

Core - The base classes for Problems and Solvers
Solvers - Value iteration and other solution methods
Problems - Example problems

For reproducible examples from the original paper, see the viso_jax repository.

Example Problems

Basic example: forest management

A simple forest management problem adapted from pymdptoolbox. This problem has a small state space by default (3 states, representing the possible age of the forest) and is useful for getting started with the package or debugging new solvers. The manager must decide whether to cut or wait at each time step, considering the trade-off between immediate revenue from cutting versus letting the forest mature.

Perishable inventory management problems

These problems demonstrate the package's ability to handle large state spaces in inventory management scenarios and were included in Farrington et al. (2025) as examples to demonstrate the benefits of implementing value iteration in JAX.

De Moor Single Product Perishable (De Moor et al. 2022)

A single-product inventory system with positive lead time and fixed useful life. Orders placed today arrive after a fixed lead time, and the state must track both current stock levels and orders in transit.

Hendrix Two Product Perishable (with substitution) (Hendrix et al. 2019)

A two-product inventory system with product substitution, where both products have fixed useful lives. Customers may be willing to substitute product A for B when B is out of stock.

Mirjalili Platelet Perishable (Mirjalili 2022; Abouee-Mehrizi et al. 2023)

A single-product inventory management problem, modelling platelet inventory management in a hospital blood bank. Features weekday-dependent demand patterns and uncertain useful life of platelets at arrival, which may depend on the order quantity.

Development

To set up a development environment using uv for dependency management:

```bash

Clone the repository

git clone https://github.com/joefarrington/mdpax.git cd mdpax

Create a virtual environment

uv venv source .venv/bin/activate

Install development dependencies

uv sync # Using uv

Install pre-commit hooks

pre-commit install

Run a subset of tests (suitable for CPU)

uv run pytest tests -m "not slow"

Run all tests (requires a GPU)

uv run pytest tests ```

Running the tests using the commands above will print a code coverage report in the terminal.

The development environment includes:

black and ruff for code formatting and linting
pytest for testing
pre-commit hooks to ensure code quality
sphinx for documentation building

See pyproject.toml for the full list of development dependencies.

Contributing

MDPax is a new library aimed at researchers and practitioners. As we're in the early stages of development, we particularly welcome feedback on the API design and suggestions for how we can make the library more accessible to users with different backgrounds and experience levels. Our goal is to make using GPUs to solve large MDPs as straightforward as possible while maintaining the flexibility needed for research applications.

Contributions are welcome in many forms:

API and Documentation Feedback:

Is the API intuitive for your use case?
Are there concepts that need better explanation?
Would additional examples help?
Open an issue with your suggestions or questions

Bug Reports: Open an issue describing:

What you were trying to do
What you expected to happen
What actually happened
Steps to reproduce the issue

Feature Requests: Open an issue describing:

The use case for the feature
Any relevant references (papers, implementations)
Possible implementation approaches

Pull Requests: For code contributions:

Open an issue first to discuss the proposed changes
Fork the repository
Create a new branch for your feature
Follow the existing code style (enforced by pre-commit hooks)
Add tests for new functionality
Update documentation as needed
Submit a PR referencing the original issue

New Problem Implementations: We're particularly interested in helping users implement new MDP problems:
- Open an issue describing the problem and citing any relevant papers
- We can help with the implementation approach and best practices
- This is a great way to contribute while learning the package

All contributions will be reviewed and should pass the automated checks (tests, linting, type checking).

Citation

If you use this software in your research, please cite

The original paper:

bibtex @article{farrington_going_2025, title = {Going faster to see further: graphics processing unit-accelerated value iteration and simulation for perishable inventory control using {JAX}}, url = {https://doi.org/10.1007/s10479-025-06551-6}, doi = {10.1007/s10479-025-06551-6}, journal = {Annals of Operations Research}, author = {Farrington, Joseph and Wong, Wai Keong and Li, Kezhi and Utley, Martin}, month = mar, year = {2025}, }

The software package:

bibtex @software{mdpax2024github, author = {Joseph Farrington}, title = {mdpax: GPU-accelerated MDP solvers in Python with JAX}, year = {2024}, url = {https://github.com/joefarrington/mdpax}, }

License

MDPax is released under the MIT License. See the LICENSE file for details.

The forest management example problem is adapted from pymdptoolbox (BSD 3-Clause License, Copyright (c) 2011-2013 Steven A. W. Cordwell and Copyright (c) 2009 INRA). Our implementation is original, using the mdpax.core.problems.Problem class.

Related Projects

viso_jax

The original research code used to produce the results in Farrington et al. (2025). Contains implementations of the perishable inventory problems and the experimental setup used in the paper. While MDPax is designed to be a general-purpose library, viso_jax focuses specifically on reproducing the paper's results and includes a detailed Colab notebook for this purpose.

Quantitative Economics with JAX

Tutorials using JAX to solve problems from quantitative economics, including value function iteration and policy iteration for MDPs.

VFI Toolkit

A MATLAB toolkit for value function iteration, specifically in the context of macroeconomic modeling. Like MDPax, the toolkit automatically uses NVIDIA GPUs when available. Unlike MDPax, the toolkit requires the full transition matrix to be provided, which can be infeasible for very large problems.

pymdptoolbox

A Python library for solving MDPs that implements several classic algorithms including value iteration, policy iteration, and Q-learning. Related packages are available for MATLAB, GNU Octave, Scilab and R (Chadès et al. 2014). pymdptoolbox does not support GPU-acceleration and, like the VFI Toolkit, requires the user to provide the full transition matrix for problems.

Acknowledgments

This library is based on research code developed during Joseph Farrington's PhD at University College London under the supervision of Ken Li, Martin Utley, and Wai Keong Wong.

The PhD was generously supported by:

UKRI training grant EP/S021612/1, the CDT in AI-enabled Healthcare Systems
The Clinical and Research Informatics Unit at the NIHR University College London Hospitals Biomedical Research Centre

Owner

Name: Joe Farrington
Login: joefarrington
Kind: user
Location: London
Company: UCL

Website: joefarrington.github.io
Twitter: joe_farrington
Repositories: 16
Profile: https://github.com/joefarrington

PhD student at the CDT in AI-enabled healthcare systems at University College London.

JOSS Publication

MDPax: GPU-accelerated MDP solvers in Python with JAX

Published

October 31, 2025

DOI

10.21105/joss.08897

Volume 10, Issue 114, Page 8897

Authors

Joseph Farrington

Institute of Health Informatics, University College London, United Kingdom

Wai Keong Wong

Institute of Health Informatics, University College London, United Kingdom, NIHR University College London Hospitals Biomedical Research Centre, United Kingdom, Cambridge University Hospitals NHS Foundation Trust, United Kingdom

Kezhi Li

Institute of Health Informatics, University College London, United Kingdom

Martin Utley

Clinical Operational Research Unit, University College London, United Kingdom

Editor

Patrick Diehl

GitHub Events

Total

Release event: 7
Watch event: 24
Delete event: 5
Issue comment event: 3
Push event: 32
Public event: 1
Pull request event: 4
Create event: 8

Last Year

Release event: 7
Watch event: 24
Delete event: 5
Issue comment event: 3
Push event: 32
Public event: 1
Pull request event: 4
Create event: 8

Committers

Last synced: 5 months ago

All Time

Total Commits: 121
Total Committers: 1
Avg Commits per committer: 121.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 121
Committers: 1
Avg Commits per committer: 121.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Joe Farrington	f**e@g**m	121

Issues and Pull Requests

Last synced: 4 months ago

All Time

Total issues: 0
Total pull requests: 3
Average time to close issues: N/A
Average time to close pull requests: 8 minutes
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.33
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 3
Average time to close issues: N/A
Average time to close pull requests: 8 minutes
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.33
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

joefarrington (3)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 17 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 4
Total maintainers: 1

pypi.org: mdpax

GPU-accelerated MDP solvers in Python with JAX

Documentation: https://mdpax.readthedocs.io/
License: MIT
Latest release: 0.2.2
published 4 months ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 17 Last month

Rankings

Dependent packages count: 9.8%

Average: 32.5%

Dependent repos count: 55.2%

Maintainers (1)

joefarrington

Last synced: 4 months ago

MDPax: GPU-accelerated MDP solvers in Python with JAX

Science Score: 93.0%

Keywords

Basic Info

Statistics

Topics

Metadata Files

MDPax

Key capabilities

Overview

Installation

GPU (recommended)

CPU only

Examples

Google Colab

Quick Start

Create forest management problem

Create solver with discount factor gamma = 0.9,

and convergence tolerance epsilon = 0.01

Solve the problem (automatically uses GPU if available)

Access the optimal policy and value function

Documentation

Tutorials

API Reference

Example Problems

Basic example: forest management

Perishable inventory management problems

De Moor Single Product Perishable (De Moor et al. 2022)

Hendrix Two Product Perishable (with substitution) (Hendrix et al. 2019)

Mirjalili Platelet Perishable (Mirjalili 2022; Abouee-Mehrizi et al. 2023)

Development

Clone the repository

Create a virtual environment

Install development dependencies

Install pre-commit hooks

Run a subset of tests (suitable for CPU)

Run all tests (requires a GPU)

Contributing

Citation

License

Related Projects

Acknowledgments

JOSS Publication

MDPax: GPU-accelerated MDP solvers in Python with JAX

Authors

Editor

Tags

GitHub Events

Total

Last Year

All Time

Past Year

Top Committers

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: mdpax

Rankings

Maintainers (1)