bentoml
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
7 of 222 committers (3.2%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.4%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Basic Info
- Host: GitHub
- Owner: bentoml
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://bentoml.com
- Size: 98.3 MB
Statistics
- Stars: 8,028
- Watchers: 80
- Forks: 871
- Open Issues: 139
- Releases: 177
Topics
Metadata Files
README.md
Unified Model Serving Framework
🍱 Build model inference APIs and multi-model serving systems with any open-source or custom AI models. 👉 Join our Slack community!
What is BentoML?
BentoML is a Python library for building online serving systems optimized for AI apps and model inference.
- 🍱 Easily build APIs for Any AI/ML Model. Turn any model inference script into a REST API server with just a few lines of code and standard Python type hints.
- 🐳 Docker Containers made simple. No more dependency hell! Manage your environments, dependencies and model versions with a simple config file. BentoML automatically generates Docker images, ensures reproducibility, and simplifies how you deploy to different environments.
- 🧭 Maximize CPU/GPU utilization. Build high performance inference APIs leveraging built-in serving optimization features like dynamic batching, model parallelism, multi-stage pipeline and multi-model inference-graph orchestration.
- 👩💻 Fully customizable. Easily implement your own APIs or task queues, with custom business logic, model inference and multi-model composition. Supports any ML framework, modality, and inference runtime.
- 🚀 Ready for Production. Develop, run and debug locally. Seamlessly deploy to production with Docker containers or BentoCloud.
Getting started
Install BentoML:
```
Requires Python≥3.9
pip install -U bentoml ```
Define APIs in a service.py file.
```python import bentoml
@bentoml.service( image=bentoml.images.Image(pythonversion="3.11").pythonpackages("torch", "transformers"), ) class Summarization: def init(self) -> None: import torch from transformers import pipeline
device = "cuda" if torch.cuda.is_available() else "cpu"
self.pipeline = pipeline('summarization', device=device)
@bentoml.api(batchable=True)
def summarize(self, texts: list[str]) -> list[str]:
results = self.pipeline(texts)
return [item['summary_text'] for item in results]
```
💻 Run locally
Install PyTorch and Transformers packages to your Python virtual environment.
bash
pip install torch transformers # additional dependencies for local run
Run the service code locally (serving at http://localhost:3000 by default):
bash
bentoml serve
You should expect to see the following output.
[INFO] [cli] Starting production HTTP BentoServer from "service:Summarization" listening on http://localhost:3000 (Press CTRL+C to quit)
[INFO] [entry_service:Summarization:1] Service Summarization initialized
Now you can run inference from your browser at http://localhost:3000 or with a Python script:
```python import bentoml
with bentoml.SyncHTTPClient('http://localhost:3000') as client: summarizedtext: str = client.summarize([bentoml.doc])[0] print(f"Result: {summarizedtext}") ```
🐳 Deploy using Docker
Run bentoml build to package necessary code, models, dependency configs into a Bento - the standardized deployable artifact in BentoML:
bash
bentoml build
Ensure Docker is running. Generate a Docker container image for deployment:
bash
bentoml containerize summarization:latest
Run the generated image:
bash
docker run --rm -p 3000:3000 summarization:latest
☁️ Deploy on BentoCloud
BentoCloud provides compute infrastructure for rapid and reliable GenAI adoption. It helps speed up your BentoML development process leveraging cloud compute resources, and simplify how you deploy, scale and operate BentoML in production.
Sign up for BentoCloud for personal access; for enterprise use cases, contact our team.
```bash
After signup, run the following command to create an API token:
bentoml cloud login
Deploy from current directory:
bentoml deploy ```

For detailed explanations, read the Hello World example.
Examples
- LLMs: Llama 3.2, Mistral, DeepSeek Distil, and more.
- Image Generation: Stable Diffusion 3 Medium, Stable Video Diffusion, Stable Diffusion XL Turbo, ControlNet, and LCM LoRAs.
- Embeddings: SentenceTransformers and ColPali
- Audio: ChatTTS, XTTS, WhisperX, Bark
- Computer Vision: YOLO and ResNet
- Advanced examples: Function calling, LangGraph, CrewAI
Check out the full list for more sample code and usage.
Advanced topics
- Model composition
- Workers and model parallelization
- Adaptive batching
- GPU inference
- Distributed serving systems
- Concurrency and autoscaling
- Model loading and Model Store
- Observability
- BentoCloud deployment
See Documentation for more tutorials and guides.
Community
Get involved and join our Community Slack 💬, where thousands of AI/ML engineers help each other, contribute to the project, and talk about building AI products.
To report a bug or suggest a feature request, use GitHub Issues.
Contributing
There are many ways to contribute to the project:
- Report bugs and "Thumbs up" on issues that are relevant to you.
- Investigate issues and review other developers' pull requests.
- Contribute code or documentation to the project by submitting a GitHub pull request.
- Check out the Contributing Guide and Development Guide to learn more.
- Share your feedback and discuss roadmap plans in the
#bentoml-contributorschannel here.
Thanks to all of our amazing contributors!
Usage tracking and feedback
The BentoML framework collects anonymous usage data that helps our community improve the product. Only BentoML's internal API calls are being reported. This excludes any sensitive information, such as user code, model data, model names, or stack traces. Here's the code used for usage tracking. You can opt-out of usage tracking by the --do-not-track CLI option:
bash
bentoml [command] --do-not-track
Or by setting the environment variable:
bash
export BENTOML_DO_NOT_TRACK=True
License
Owner
- Name: BentoML
- Login: bentoml
- Kind: organization
- Location: San Francisco
- Website: https://bentoml.com
- Twitter: bentomlai
- Repositories: 76
- Profile: https://github.com/bentoml
The most flexible way to serve AI models in production
Citation (CITATION.cff)
cff-version: 1.2.0
title: 'BentoML: The framework for building reliable, scalable and cost-efficient AI application'
message: >-
If you use this software, please cite it using these
metadata.
type: software
authors:
- given-names: Chaoyu
family-names: Yang
email: chaoyu@bentoml.com
- given-names: Sean
family-names: Sheng
email: ssheng@bentoml.com
- given-names: Aaron
family-names: Pham
email: aarnphm@bentoml.com
orcid: 'https://orcid.org/0009-0008-3180-5115'
- given-names: Shenyang
family-names: ' Zhao'
email: larme@bentoml.com
- given-names: Sauyon
family-names: Lee
email: sauyon@bentoml.com
- given-names: Bo
family-names: Jiang
email: jiang@bentoml.com
- given-names: Fog
family-names: Dong
email: fog@bentoml.com
- given-names: Xipeng
family-names: Guan
email: xipeng@bentoml.com
- given-names: Frost
family-names: Ming
email: frost@bentoml.com
repository-code: 'https://github.com/bentoml/bentoml'
url: 'https://bentoml.com/'
keywords:
- MLOps
- LLMOps
- LLM
- Infrastructure
- BentoML
- LLM Serving
- Model Serving
- Serverless Deployment
license: Apache-2.0
GitHub Events
Total
- Create event: 150
- Issues event: 126
- Release event: 36
- Watch event: 834
- Delete event: 118
- Member event: 1
- Issue comment event: 281
- Push event: 380
- Pull request review comment event: 164
- Pull request event: 642
- Pull request review event: 396
- Fork event: 96
Last Year
- Create event: 150
- Issues event: 126
- Release event: 36
- Watch event: 834
- Delete event: 118
- Member event: 1
- Issue comment event: 281
- Push event: 380
- Pull request review comment event: 164
- Pull request event: 642
- Pull request review event: 396
- Fork event: 96
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Chaoyu | p****g@g****m | 562 |
| Aaron Pham | 2****m | 534 |
| Frost Ming | me@f****m | 413 |
| bojiang | 5****g | 355 |
| Bozhao | y****6@g****m | 325 |
| Sherlock Xu | 6****3 | 254 |
| Sauyon Lee | 2****n | 196 |
| dependabot[bot] | 4****] | 159 |
| Sean Sheng | s****g@g****m | 114 |
| Zhao Shenyang | d****v@z****m | 105 |
| Leon | i@l****m | 45 |
| Tianxin Dong | f****g@b****m | 37 |
| Jian Shen | j****2@g****m | 28 |
| xianxian.zhang | 1****l | 25 |
| pre-commit-ci[bot] | 6****] | 23 |
| Jacky Zhao | j****9@g****m | 17 |
| yetone | y****l@g****m | 17 |
| Steve Guo | 4****o | 16 |
| Judah Rand | 1****d | 11 |
| Jithin James | j****7@g****m | 11 |
| Jinyang Liu | l****g@b****e | 10 |
| Sungjun.Kim | s****m@l****m | 10 |
| Tim Liu | 9****l | 9 |
| Aanand Kainth | a****d@a****e | 9 |
| Tasha J. Kim | t****m@g****m | 6 |
| MingLiangDai | 9****i | 5 |
| Quan Nguyen | 8****r | 5 |
| devin-ai-integration[bot] | 1****] | 5 |
| lintingzhen | l****n@g****m | 5 |
| Mayur Newase | m****1@g****m | 5 |
| and 192 more... | ||
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 405
- Total pull requests: 1,928
- Average time to close issues: 10 months
- Average time to close pull requests: 11 days
- Total issue authors: 254
- Total pull request authors: 96
- Average comments per issue: 1.97
- Average comments per pull request: 0.54
- Merged pull requests: 1,623
- Bot issues: 0
- Bot pull requests: 119
Past Year
- Issues: 69
- Pull requests: 747
- Average time to close issues: 7 days
- Average time to close pull requests: about 23 hours
- Issue authors: 58
- Pull request authors: 42
- Average comments per issue: 1.29
- Average comments per pull request: 0.39
- Merged pull requests: 648
- Bot issues: 0
- Bot pull requests: 48
Top Authors
Issue Authors
- aarnphm (36)
- ssheng (14)
- sauyon (10)
- parano (9)
- KimSoungRyoul (9)
- holzweber (8)
- smidm (7)
- judahrand (6)
- rlleshi (5)
- Hubert-Bonisseur (4)
- nadworny (4)
- Matthieu-Tinycoaching (4)
- MattiasDC (3)
- BangDaeng (3)
- isuyyy (3)
Pull Request Authors
- frostming (634)
- Sherlock113 (448)
- aarnphm (148)
- bojiang (76)
- FogDong (63)
- xianml (60)
- dependabot[bot] (56)
- ssheng (54)
- sauyon (46)
- pre-commit-ci[bot] (40)
- jianshen92 (39)
- devin-ai-integration[bot] (21)
- Haivilo (20)
- judahrand (19)
- larme (17)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 8
-
Total downloads:
- pypi 106,500 last-month
- Total docker downloads: 7,841
-
Total dependent packages: 13
(may contain duplicates) -
Total dependent repositories: 502
(may contain duplicates) - Total versions: 540
- Total maintainers: 6
- Total advisories: 8
pypi.org: bentoml
BentoML: The easiest way to serve AI apps and models
- Homepage: https://bentoml.com
- Documentation: https://docs.bentoml.com
- License: Apache-2.0
-
Latest release: 1.4.23
published 6 months ago
Rankings
Advisories (8)
- BentoML deserialization vulnerability
- BentoML Allows Remote Code Execution (RCE) via Insecure Deserialization
- BentoML vulnerable to Uncontrolled Resource Consumption
- BentoML's runner server Vulnerable to Remote Code Execution (RCE) via Insecure Deserialization
- BentoML SSRF Vulnerability in File Upload Processing
- BentoML Denial of Service (DoS) via Multipart Boundary
- Insecure deserialization in BentoML
- BentoML Open Redirect vulnerability
proxy.golang.org: github.com/bentoml/bentoml
- Homepage: https://github.com/bentoml/bentoml
- Documentation: https://pkg.go.dev/github.com/bentoml/bentoml#section-documentation
- License: Apache-2.0
-
Latest release: v1.4.22
published 6 months ago
Rankings
proxy.golang.org: github.com/bentoml/BentoML
- Homepage: https://github.com/bentoml/BentoML
- Documentation: https://pkg.go.dev/github.com/bentoml/BentoML#section-documentation
- License: Apache-2.0
-
Latest release: v1.4.22
published 6 months ago
Rankings
pypi.org: yatai
Model and deployment management for BentoML
- Homepage: https://github.com/bentoml/bentoml
- Documentation: https://yatai.readthedocs.io/
- License: apache-2.0
-
Latest release: 0.0.1
published over 4 years ago
Rankings
Maintainers (1)
pypi.org: sentencebertservice
BentoML generated model module
- Homepage: https://github.com/bentoml/BentoML
- Documentation: https://sentencebertservice.readthedocs.io/
- License: apache-2.0
-
Latest release: 20211205152102
published about 4 years ago
Rankings
Maintainers (1)
conda-forge.org: bentoml
BentoML simplifies ML model deployment and serves your models at production scale. PyPI: [https://pypi.org/project/bentoml/](https://pypi.org/project/bentoml/)
- Homepage: https://github.com/bentoml/BentoML
- License: Apache-2.0
-
Latest release: 1.0.0
published over 3 years ago
Rankings
pypi.org: bentoml-core
The rust core of BentoML: The Unified Model Serving Framework
- Documentation: https://docs.bentoml.org/en/latest/
- License: Apache-2.0
-
Latest release: 0.1.0
published over 2 years ago
Rankings
pypi.org: bentoml-unsloth
BentoML: The easiest way to serve AI apps and models
- Homepage: https://bentoml.com
- Documentation: https://docs.bentoml.com
- License: Apache-2.0
-
Latest release: 0.1.2
published over 1 year ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v4 composite
- actions/download-artifact v3 composite
- actions/upload-artifact v3 composite
- docker/setup-buildx-action v3 composite
- docker/setup-qemu-action v3 composite
- marocchino/sticky-pull-request-comment v2 composite
- pdm-project/setup-pdm v3 composite
- re-actors/alls-green release/v1 composite
- actions/checkout v4 composite
- actions/checkout v4 composite
- github/codeql-action/analyze v2 composite
- github/codeql-action/autobuild v2 composite
- github/codeql-action/init v2 composite
- actions/checkout v4 composite
- pdm-project/setup-pdm v3 composite
- actions/checkout v4 composite
- actions/download-artifact v3 composite
- actions/setup-python v4 composite
- actions/upload-artifact v3 composite
- pypa/gh-action-pypi-publish release/v1 composite
- python 3-bullseye build
- iris_classifier klncyjcfqwldtgxi
- jaegertracing/all-in-one 1.38
- bentoml >=1.0.19
- pandas *
- scikit-learn *
- Jinja2 >=3.0.1
- PyYAML >=5.0
- aiohttp *
- attrs >=21.1.0
- cattrs >=22.1.0,<23.2.0
- circus >=0.17.0,!=0.17.2
- click >=7.0
- click-option-group *
- cloudpickle >=2.0.0
- deepmerge *
- fs *
- httpx *
- inflection *
- numpy *
- nvidia-ml-py <12
- opentelemetry-api ==1.20.0
- opentelemetry-instrumentation ==0.41b0
- opentelemetry-instrumentation-aiohttp-client ==0.41b0
- opentelemetry-instrumentation-asgi ==0.41b0
- opentelemetry-sdk ==1.20.0
- opentelemetry-semantic-conventions ==0.41b0
- opentelemetry-util-http ==0.41b0
- packaging >=22.0
- pathspec *
- pip-requirements-parser >=31.2.0
- pip-tools >=6.6.2
- prometheus-client >=0.10.0
- psutil *
- python-dateutil *
- python-json-logger *
- python-multipart *
- requests *
- rich >=11.2.0
- schema *
- simple-di >=0.1.4
- starlette >=0.24.0
- uvicorn *
- watchfiles >=0.15.0
- Pillow * test
- pydantic * test
- pandas * test
- pyarrow * test
- scikit-learn * test
- Pillow * test
- fastapi * test
- pydantic * test
- starlette <0.26 test
- cloudpickle >=2.0.0 test
- mlflow * test
- psutil >=5.8.0 test
- scikit-learn >=1.0.2 test
- pydantic >=2 test
- scikit-learn * test
- bentoml *
- torch *
- transformers ==4.30.0
- bentoml *
- pandas *
- pdf2img *
- pydub *
- torch *
- accelerate *
- bentoml *
- diffusers *
- torch *
- transformers *
- _libgcc_mutex 0.1
- _openmp_mutex 4.5
- blas 1.0
- ca-certificates 2021.10.26
- certifi 2021.10.8
- intel-openmp 2021.4.0
- joblib 1.1.0
- ld_impl_linux-64 2.35.1
- libffi 3.3
- libgcc-ng 9.3.0
- libgfortran-ng 7.5.0
- libgfortran4 7.5.0
- libgomp 9.3.0
- libstdcxx-ng 9.3.0
- mkl 2021.4.0
- mkl-service 2.4.0
- mkl_fft 1.3.1
- mkl_random 1.2.2
- ncurses 6.3
- numpy 1.21.2
- numpy-base 1.21.2
- openssl 1.1.1l
- pip 21.2.4
- python 3.9.7
- readline 8.1
- scikit-learn 1.0.1
- scipy 1.7.1
- setuptools 58.0.4
- six 1.16.0
- sqlite 3.36.0
- threadpoolctl 2.2.0
- tk 8.6.11
- tzdata 2021e
- wheel 0.37.0
- xz 5.2.5
- zlib 1.2.11
- bentoml *
- fastapi *
- gradio *
- torch *
- transformers *
- bentoml *
- mlflow *
- scikit-learn *
- bentoml *
- torch *
- transformers *
- bentoml *
- scikit-learn *
- xgboost *