pytriton

PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.

https://github.com/triton-inference-server/pytriton

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.6%) to scientific vocabulary

Keywords

deep-learning gpu inference

Last synced: 6 months ago · JSON representation ·

Repository

PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.

Basic Info

Host: GitHub
Owner: triton-inference-server
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://triton-inference-server.github.io/pytriton/
Size: 9.14 MB

Statistics

Stars: 815
Watchers: 17
Forks: 55
Open Issues: 13
Releases: 33

Topics

deep-learning gpu inference

Created over 3 years ago · Last pushed 6 months ago

Metadata Files

Readme Changelog Contributing License Citation

PyTriton

Welcome to PyTriton, a Flask/FastAPI-like framework designed to streamline the use of NVIDIA's Triton Inference Server within Python environments. PyTriton enables serving Machine Learning models with ease, supporting direct deployment from Python.

For comprehensive guidance on how to deploy your models, optimize performance, and explore the API, delve into the extensive resources found in our documentation.

Features at a Glance

The distinct capabilities of PyTriton are summarized in the feature matrix:

| Feature | Description | | ------- | ----------- | | Native Python support | You can create any Python function and expose it as an HTTP/gRPC API. | | Framework-agnostic | You can run any Python code with any framework of your choice, such as: PyTorch, TensorFlow, or JAX. | | Performance optimization | You can benefit from dynamic batching, response cache, model pipelining, clusters, performance tracing, and GPU/CPU inference. | Decorators | You can use batching decorators to handle batching and other pre-processing tasks for your inference function. | | Easy installation and setup | You can use a simple and familiar interface based on Flask/FastAPI for easy installation and setup. | | Model clients | You can access high-level model clients for HTTP/gRPC requests with configurable options and both synchronous and asynchronous API. | | Streaming (alpha) | You can stream partial responses from a model by serving it in a decoupled mode. |

Learn more about PyTriton's architecture.

Prerequisites

Before proceeding with the installation of PyTriton, ensure your system meets the following criteria:

Operating System: Compatible with glibc version 2.35 or higher.
- Primarily tested on Ubuntu 22.04.
- Other supported OS include Debian 11+, Rocky Linux 9+, and Red Hat UBI 9+.
- Use ldd --version to verify your glibc version.
Python: Version 3.8 or newer.
pip: Version 20.3 or newer.
libpython: Ensure libpython3.*.so is installed, corresponding to your Python version.

Install

The PyTriton can be installed from pypi.org by running the following command:

shell pip install nvidia-pytriton

Important: The Triton Inference Server binary is installed as part of the PyTriton package.

Discover more about PyTriton's installation procedures, including Docker usage, prerequisites, and insights into building binaries from source to match your specific Triton server versions.

Quick Start

The quick start presents how to run Python model in Triton Inference Server without need to change the current working environment. In the example we are using a simple Linear model.

The infer_fn is a function that takes an data tensor and returns a list with single output tensor. The @batch from batching decorators is used to handle batching for the model.

```python import numpy as np from pytriton.decorators import batch

@batch def infer_fn(data): result = data * np.array([[-1]], dtype=np.float32) # Process inputs and produce result return [result] ```

In the next step, you can create the binding between the inference callable and Triton Inference Server using the bind method from pyTriton. This method takes the model name, the inference callable, the inputs and outputs tensors, and an optional model configuration object.

python from pytriton.model_config import Tensor from pytriton.triton import Triton triton = Triton() triton.bind( model_name="Linear", infer_func=infer_fn, inputs=[Tensor(name="data", dtype=np.float32, shape=(-1,)),], outputs=[Tensor(name="result", dtype=np.float32, shape=(-1,)),], ) triton.run()

Finally, you can send an inference query to the model using the ModelClient class. The infer_sample method takes the input data as a numpy array and returns the output data as a numpy array. You can learn more about the ModelClient class in the clients section.

```python from pytriton.client import ModelClient

client = ModelClient("localhost", "Linear") data = np.array([1, 2, ], dtype=np.float32) print(client.infer_sample(data=data)) ``` After the inference is done, you can stop the Triton Inference Server and close the client:

python client.close() triton.stop()

The output of the inference should be:

python {'result': array([-1., -2.], dtype=float32)}

For the full example, including defining the model and binding it to the Triton server, check out our detailed Quick Start instructions. Get your model up and running, explore how to serve it, and learn how to invoke it from client applications.

The full example code can be found in examples/linearrandompytorch.

Examples

The examples page presents various cases of serving models using PyTriton. You can find simple examples of running PyTorch, TensorFlow2, JAX, and simple Python models. Additionally, we have prepared more advanced scenarios like online learning, multi-node models, or deployment on Kubernetes using PyTriton. Each example contains instructions describing how to build and run the example. Learn more about how to use PyTriton by reviewing our examples.

Useful Links

Owner

Name: Triton Inference Server
Login: triton-inference-server
Kind: organization

Website: https://developer.nvidia.com/nvidia-triton-inference-server
Repositories: 34
Profile: https://github.com/triton-inference-server

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "PyTriton: Framework facilitating NVIDIA Triton Inference Server usage in Python environments."
url: https://github.com/triton-inference-server/pytriton
repository-code: https://github.com/triton-inference-server/pytriton
authors:
  - name: "NVIDIA Corporation"

GitHub Events

Total

Create event: 2
Release event: 2
Issues event: 43
Watch event: 86
Issue comment event: 74
Push event: 7
Pull request review event: 3
Pull request review comment event: 2
Pull request event: 3
Fork event: 6

Last Year

Create event: 2
Release event: 2
Issues event: 43
Watch event: 86
Issue comment event: 74
Push event: 7
Pull request review event: 3
Pull request review comment event: 2
Pull request event: 3
Fork event: 6

Committers

Last synced: 9 months ago

All Time

Total Commits: 283
Total Committers: 12
Avg Commits per committer: 23.583
Development Distribution Score (DDS): 0.576

Past Year

Commits: 32
Committers: 5
Avg Commits per committer: 6.4
Development Distribution Score (DDS): 0.344

Top Committers

Name	Email	Commits
Pawel Ziecina	p**a@n**m	120
Piotr Marcinkiewicz	p**m@n**m	86
Jakub Kosek	j**k@n**m	44
Blazej Kubiak	b**k@n**m	22
Pierre Chapuis	g**t@c**o	4
Yoshimura Naoya	y**8@g**m	1
R0CKSTAR	y**n@g**m	1
Michal Szolucha	m**a@n**m	1
Matthew Kotila	m**a@g**m	1
Mahimai Raja J	m**3@g**m	1
Francesco Petrini	f**i@g**m	1
Anton Peganov	a**v@n**m	1

Committer Domains (Top 20 + Academic)

nvidia.com: 6 catwell.info: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 93
Total pull requests: 12
Average time to close issues: about 1 month
Average time to close pull requests: 13 days
Total issue authors: 66
Total pull request authors: 9
Average comments per issue: 3.99
Average comments per pull request: 2.0
Merged pull requests: 9
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 21
Pull requests: 4
Average time to close issues: 29 days
Average time to close pull requests: 6 days
Issue authors: 18
Pull request authors: 3
Average comments per issue: 2.52
Average comments per pull request: 1.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

lfxx (7)
leafjungle (5)
dogky123 (3)
HJH0924 (3)
yangxianpku (3)
lionsheep0724 (3)
sricke (2)
Pobby321 (2)
sourabh-patil (2)
oeway (2)
piotrm-nvidia (2)
zbloss (2)
peterroelants (2)
seawater668 (1)
dmatlak (1)

Pull Request Authors

catwell (7)
fpetrini15 (2)
this (2)
getty708 (2)
piotrm-nvidia (1)
mahimairaja (1)
PeganovAnton (1)
matthewkotila (1)

Top Labels

Issue Labels

Stale (40) non-stale (8) enhancement (7) question (5) bug (4) documentation (2)

Pull Request Labels

documentation (3) bug (2) Stale (1)

Packages

Total packages: 1
Total downloads:
- pypi 46,290 last-month
Total docker downloads: 204

Total dependent packages: 2
Total dependent repositories: 2
Total versions: 30
Total maintainers: 1

pypi.org: nvidia-pytriton

PyTriton - Flask/FastAPI-like interface to simplify Triton's deployment in Python environments.

Documentation: https://triton-inference-server.github.io/pytriton
License: Apache 2.0
Latest release: 0.7.0
published 6 months ago

Versions: 30
Dependent Packages: 2
Dependent Repositories: 2
Downloads: 46,290 Last month
Docker Downloads: 204

Rankings

Docker downloads count: 2.5%

Stargazers count: 2.6%

Downloads: 3.4%

Dependent packages count: 4.8%

Average: 5.2%

Forks count: 6.4%

Dependent repos count: 11.5%

Maintainers (1)

nvidia

Last synced: 6 months ago

Dependencies

.github/workflows/stale.yaml actions

actions/stale v8 composite

examples/huggingface_bart_pytorch/kubernetes/Dockerfile docker

${FROM_IMAGE_NAME} latest build
base latest build
install-from-${BUILD_FROM} latest build

examples/huggingface_opt_multinode_jax/Dockerfile docker

${FROM_IMAGE_NAME} latest build

examples/huggingface_opt_multinode_jax/kubernetes/Dockerfile docker

${FROM_IMAGE_NAME} latest build
base latest build
install-from-${BUILD_FROM} latest build

examples/huggingface_resnet_pytorch/kubernetes/Dockerfile docker

${FROM_IMAGE_NAME} latest build
base latest build
install-from-${BUILD_FROM} latest build

examples/huggingface_stable_diffusion/kubernetes/Dockerfile docker

${FROM_IMAGE_NAME} latest build
base latest build
install-from-${BUILD_FROM} latest build

examples/nemo_megatron_gpt_multinode/kubernetes/Dockerfile docker

${FROM_IMAGE_NAME} latest build
base latest build
install-from-${BUILD_FROM} latest build

pyproject.toml pypi

numpy ~= 1.21
protobuf >=3.7.0
pyzmq ~= 23.0
sh ~= 1.14
tritonclient [all] ~= 2.33
typing_inspect ~= 0.6.0
wrapt >= 1.11.0

pytriton

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

PyTriton

Features at a Glance

Prerequisites

Install

Quick Start

Examples

Useful Links

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: nvidia-pytriton

Rankings

Maintainers (1)

Dependencies