openllm
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
Basic Info
- Host: GitHub
- Owner: bentoml
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://bentoml.com
- Size: 41.1 MB
Statistics
- Stars: 11,738
- Watchers: 60
- Forks: 763
- Open Issues: 7
- Releases: 147
Topics
Metadata Files
README.md
🦾 OpenLLM: Self-Hosting LLMs Made Easy
[](https://github.com/bentoml/OpenLLM/blob/main/LICENSE) [](https://pypi.org/project/openllm) [](https://results.pre-commit.ci/latest/github/bentoml/OpenLLM/main) [](https://twitter.com/bentomlai) [](https://l.bentoml.com/join-slack)OpenLLM allows developers to run any open-source LLMs (Llama 3.3, Qwen2.5, Phi3 and more) or custom models as OpenAI-compatible APIs with a single command. It features a built-in chat UI, state-of-the-art inference backends, and a simplified workflow for creating enterprise-grade cloud deployment with Docker, Kubernetes, and BentoCloud.
Understand the design philosophy of OpenLLM.
Get Started
Run the following commands to install OpenLLM and explore it interactively.
bash
pip install openllm # or pip3 install openllm
openllm hello
Supported models
OpenLLM supports a wide range of state-of-the-art open-source LLMs. You can also add a model repository to run custom models with OpenLLM.
| Model | Parameters | Required GPU | Start a Server |
|---|---|---|---|
| deepseek | r1-671b | 80Gx16 | openllm serve deepseek:r1-671b |
| gemma2 | 2b | 12G | openllm serve gemma2:2b |
| gemma3 | 3b | 12G | openllm serve gemma3:3b |
| jamba1.5 | mini-ff0a | 80Gx2 | openllm serve jamba1.5:mini-ff0a |
| llama3.1 | 8b | 24G | openllm serve llama3.1:8b |
| llama3.2 | 1b | 24G | openllm serve llama3.2:1b |
| llama3.3 | 70b | 80Gx2 | openllm serve llama3.3:70b |
| llama4 | 17b16e | 80Gx8 | openllm serve llama4:17b16e |
| mistral | 8b-2410 | 24G | openllm serve mistral:8b-2410 |
| mistral-large | 123b-2407 | 80Gx4 | openllm serve mistral-large:123b-2407 |
| phi4 | 14b | 80G | openllm serve phi4:14b |
| pixtral | 12b-2409 | 80G | openllm serve pixtral:12b-2409 |
| qwen2.5 | 7b | 24G | openllm serve qwen2.5:7b |
| qwen2.5-coder | 3b | 24G | openllm serve qwen2.5-coder:3b |
| qwq | 32b | 80G | openllm serve qwq:32b |
For the full model list, see the OpenLLM models repository.
Start an LLM server
To start an LLM server locally, use the openllm serve command and specify the model version.
[!NOTE] OpenLLM does not store model weights. A Hugging Face token (HF_TOKEN) is required for gated models.
- Create your Hugging Face token here.
- Request access to the gated model, such as meta-llama/Llama-3.2-1B-Instruct.
- Set your token as an environment variable by running:
bash export HF_TOKEN=<your token>
bash
openllm serve llama3.2:1b
The server will be accessible at http://localhost:3000, providing OpenAI-compatible APIs for interaction. You can call the endpoints with different frameworks and tools that support OpenAI-compatible APIs. Typically, you may need to specify the following:
- The API host address: By default, the LLM is hosted at http://localhost:3000.
- The model name: The name can be different depending on the tool you use.
- The API key: The API key used for client authentication. This is optional.
Here are some examples:
OpenAI Python client
```python from openai import OpenAI client = OpenAI(base_url='http://localhost:3000/v1', api_key='na') # Use the following func to get the available models # model_list = client.models.list() # print(model_list) chat_completion = client.chat.completions.create( model="meta-llama/Llama-3.2-1B-Instruct", messages=[ { "role": "user", "content": "Explain superconductors like I'm five years old" } ], stream=True, ) for chunk in chat_completion: print(chunk.choices[0].delta.content or "", end="") ```LlamaIndex
```python from llama_index.llms.openai import OpenAI llm = OpenAI(api_bese="http://localhost:3000/v1", model="meta-llama/Llama-3.2-1B-Instruct", api_key="dummy") ... ```Chat UI
OpenLLM provides a chat UI at the /chat endpoint for the launched LLM server at http://localhost:3000/chat.
Chat with a model in the CLI
To start a chat conversation in the CLI, use the openllm run command and specify the model version.
bash
openllm run llama3:8b
Model repository
A model repository in OpenLLM represents a catalog of available LLMs that you can run. OpenLLM provides a default model repository that includes the latest open-source LLMs like Llama 3, Mistral, and Qwen2, hosted at this GitHub repository. To see all available models from the default and any added repository, use:
bash
openllm model list
To ensure your local list of models is synchronized with the latest updates from all connected repositories, run:
bash
openllm repo update
To review a model’s information, run:
bash
openllm model get llama3.2:1b
Add a model to the default model repository
You can contribute to the default model repository by adding new models that others can use. This involves creating and submitting a Bento of the LLM. For more information, check out this example pull request.
Set up a custom repository
You can add your own repository to OpenLLM with custom models. To do so, follow the format in the default OpenLLM model repository with a bentos directory to store custom LLMs. You need to build your Bentos with BentoML and submit them to your model repository.
First, prepare your custom models in a bentos directory following the guidelines provided by BentoML to build Bentos. Check out the default model repository for an example and read the Developer Guide for details.
Then, register your custom model repository with OpenLLM:
bash
openllm repo add <repo-name> <repo-url>
Note: Currently, OpenLLM only supports adding public repositories.
Deploy to BentoCloud
OpenLLM supports LLM cloud deployment via BentoML, the unified model serving framework, and BentoCloud, an AI inference platform for enterprise AI teams. BentoCloud provides fully-managed infrastructure optimized for LLM inference with autoscaling, model orchestration, observability, and many more, allowing you to run any AI model in the cloud.
Sign up for BentoCloud for free and log in. Then, run openllm deploy to deploy a model to BentoCloud:
bash
openllm deploy llama3.2:1b --env HF_TOKEN
[!NOTE] If you are deploying a gated model, make sure to set HF_TOKEN in enviroment variables.
Once the deployment is complete, you can run model inference on the BentoCloud console:
Community
OpenLLM is actively maintained by the BentoML team. Feel free to reach out and join us in our pursuit to make LLMs more accessible and easy to use 👉 Join our Slack community!
Contributing
As an open-source project, we welcome contributions of all kinds, such as new features, bug fixes, and documentation. Here are some of the ways to contribute:
- Repost a bug by creating a GitHub issue.
- Submit a pull request or help review other developers’ pull requests.
- Add an LLM to the OpenLLM default model repository so that other users can run your model. See the pull request template.
- Check out the Developer Guide to learn more.
Acknowledgements
This project uses the following open-source projects:
- bentoml/bentoml for production level model serving
- vllm-project/vllm for production level LLM backend
- blrchen/chatgpt-lite for a fancy Web Chat UI
- astral-sh/uv for blazing fast model requirements installing
We are grateful to the developers and contributors of these projects for their hard work and dedication.
Owner
- Name: BentoML
- Login: bentoml
- Kind: organization
- Location: San Francisco
- Website: https://bentoml.com
- Twitter: bentomlai
- Repositories: 76
- Profile: https://github.com/bentoml
The most flexible way to serve AI models in production
Citation (CITATION.cff)
cff-version: 1.2.0
title: 'OpenLLM: Operating LLMs in production'
message: >-
If you use this software, please cite it using these
metadata.
type: software
authors:
- given-names: Aaron
family-names: Pham
email: aarnphm@bentoml.com
orcid: 'https://orcid.org/0009-0008-3180-5115'
- given-names: Chaoyu
family-names: Yang
email: chaoyu@bentoml.com
- given-names: Sean
family-names: Sheng
email: ssheng@bentoml.com
- given-names: Shenyang
family-names: Zhao
email: larme@bentoml.com
- given-names: Sauyon
family-names: Lee
email: sauyon@bentoml.com
- given-names: Bo
family-names: Jiang
email: jiang@bentoml.com
- given-names: Fog
family-names: Dong
email: fog@bentoml.com
- given-names: Xipeng
family-names: Guan
email: xipeng@bentoml.com
- given-names: Frost
family-names: Ming
email: frost@bentoml.com
repository-code: 'https://github.com/bentoml/OpenLLM'
url: 'https://bentoml.com/'
abstract: >-
OpenLLM is an open platform for operating large language
models (LLMs) in production. With OpenLLM, you can run
inference with any open-source large-language models,
deploy to the cloud or on-premises, and build powerful AI
apps. It has built-in support for a wide range of
open-source LLMs and model runtime, including StableLM,
Falcon, Dolly, Flan-T5, ChatGLM, StarCoder and more.
OpenLLM helps serve LLMs over RESTful API or gRPC with one
command or query via WebUI, CLI, our Python/Javascript
client, or any HTTP client. It provides first-class
support for LangChain, BentoML and Hugging Face that
allows you to easily create your own AI apps by composing
LLMs with other models and services. Last but not least,
it automatically generates LLM server OCI-compatible
Container Images or easily deploys as a serverless
endpoint via BentoCloud.
keywords:
- MLOps
- LLMOps
- LLM
- Infrastructure
- Transformers
- LLM Serving
- Model Serving
- Serverless Deployment
license: Apache-2.0
date-released: '2023-06-13'
GitHub Events
Total
- Create event: 91
- Issues event: 38
- Release event: 17
- Watch event: 1,640
- Delete event: 79
- Issue comment event: 40
- Push event: 134
- Pull request review event: 12
- Pull request event: 159
- Fork event: 131
Last Year
- Create event: 91
- Issues event: 38
- Release event: 17
- Watch event: 1,640
- Delete event: 79
- Issue comment event: 40
- Push event: 134
- Pull request review event: 12
- Pull request event: 159
- Fork event: 131
Committers
Last synced: 6 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Aaron Pham | 2****m | 1,282 |
| dependabot[bot] | 4****] | 312 |
| bojiang | b****_@o****m | 146 |
| pre-commit-ci[bot] | 6****] | 83 |
| Sherlock Xu | 6****3 | 14 |
| Zhao Shenyang | d****v@z****m | 13 |
| Rick Zhou | r****u@g****m | 8 |
| Jian Shen | j****2@g****m | 8 |
| Chaoyu | p****g@g****m | 7 |
| xianxian.zhang | 1****l | 7 |
| XunchaoZ | 6****Z | 5 |
| agent | a****t@S****l | 3 |
| github-actions[bot] | g****] | 2 |
| MingLiangDai | 9****i | 2 |
| HeTaoPKU | 4****d | 2 |
| GutZuFusss | l****r@g****m | 2 |
| Frost Ming | m****g@g****m | 2 |
| Fazli Sapuan | f****i@s****g | 2 |
| Abhishek | 5****2 | 1 |
| Alan Poulain | c****t@a****u | 1 |
| Dennis Rall | 5****l | 1 |
| Ikko Eltociear Ashimine | e****r@g****m | 1 |
| Kuan-Chun Wang | j****o@g****m | 1 |
| Matt Hoffner | m****r@g****m | 1 |
| Miorel-Lucian Palii | m****i@g****m | 1 |
| RichardScottOZ | 7****Z | 1 |
| Sauyon Lee | 2****n | 1 |
| Sean Sheng | s****g@g****m | 1 |
| yansheng | 6****5 | 1 |
| weibeu | d****4@g****m | 1 |
| and 2 more... | ||
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 120
- Total pull requests: 372
- Average time to close issues: 3 months
- Average time to close pull requests: 4 days
- Total issue authors: 97
- Total pull request authors: 24
- Average comments per issue: 1.94
- Average comments per pull request: 0.23
- Merged pull requests: 280
- Bot issues: 1
- Bot pull requests: 263
Past Year
- Issues: 16
- Pull requests: 180
- Average time to close issues: about 1 month
- Average time to close pull requests: 1 day
- Issue authors: 15
- Pull request authors: 7
- Average comments per issue: 1.56
- Average comments per pull request: 0.15
- Merged pull requests: 148
- Bot issues: 1
- Bot pull requests: 154
Top Authors
Issue Authors
- aarnphm (12)
- hahmad2008 (4)
- Said-Ikki (2)
- andrewjwaggoner (2)
- Nanguage (2)
- zhangxinyang97 (2)
- meanwo (2)
- foxxxx001 (2)
- yalattas (2)
- mfournioux (2)
- stoneLee81 (2)
- dependabot[bot] (1)
- espdev (1)
- andysingal (1)
- Lightwave234 (1)
Pull Request Authors
- dependabot[bot] (242)
- pre-commit-ci[bot] (99)
- aarnphm (60)
- Sherlock113 (16)
- larme (8)
- bojiang (6)
- parano (4)
- fuzzie360 (2)
- dowithless (2)
- Oscarjia (2)
- charlod (2)
- matthoffner (2)
- rickzx (2)
- GutZuFusss (1)
- mantrakp04 (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 7
-
Total downloads:
- pypi 10,427 last-month
- npm 10 last-month
- Total docker downloads: 141
-
Total dependent packages: 7
(may contain duplicates) -
Total dependent repositories: 311
(may contain duplicates) - Total versions: 755
- Total maintainers: 4
pypi.org: openllm
OpenLLM: Self-hosting LLMs Made Easy.
- Homepage: https://bentoml.com
- Documentation: https://github.com/bentoml/OpenLLM#readme
- License: Apache Software License
-
Latest release: 0.6.30
published 10 months ago
Rankings
pypi.org: openllm-core
OpenLLM Core: Core components for OpenLLM.
- Homepage: https://bentoml.com
- Documentation: https://github.com/bentoml/OpenLLM/blob/main/openllm-core/README.md
- License: Apache Software License
-
Latest release: 0.5.7
published over 1 year ago
Rankings
Maintainers (1)
pypi.org: openllm-client
OpenLLM Client: Interacting with OpenLLM HTTP/gRPC server, or any BentoML server.
- Homepage: https://bentoml.com
- Documentation: https://github.com/bentoml/OpenLLM/blob/main/openllm-client/README.md
- License: Apache Software License
-
Latest release: 0.5.7
published over 1 year ago
Rankings
Maintainers (1)
proxy.golang.org: github.com/bentoml/openllm
- Documentation: https://pkg.go.dev/github.com/bentoml/openllm#section-documentation
- License: apache-2.0
-
Latest release: v0.6.30
published 10 months ago
Rankings
proxy.golang.org: github.com/bentoml/OpenLLM
- Documentation: https://pkg.go.dev/github.com/bentoml/OpenLLM#section-documentation
- License: apache-2.0
-
Latest release: v0.6.30
published 10 months ago
Rankings
npmjs.org: openllm
TS/JS binding for OpenLLM
- Homepage: https://github.com/bentoml/OpenLLM#readme
- License: Apache 2.0
-
Latest release: 0.0.2
published over 2 years ago
Rankings
Maintainers (1)
npmjs.org: openllm_client
TS/JS binding for OpenLLM
- Homepage: https://github.com/bentoml/OpenLLM#readme
- License: Apache 2.0
-
Latest release: 0.0.2
published over 2 years ago