openllm

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

Keywords

bentoml fine-tuning llama llama2 llama3-1 llama3-2 llama3-2-vision llm llm-inference llm-ops llm-serving llmops mistral mlops model-inference open-source-llm openllm vicuna

Keywords from Contributors

optimizer energy-system transformers agents mesh cryptocurrencies geoscience spacy-extension sequencers animations

Last synced: 6 months ago · JSON representation ·

Repository

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

Basic Info

Host: GitHub
Owner: bentoml
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://bentoml.com
Size: 41.1 MB

Statistics

Stars: 11,738
Watchers: 60
Forks: 763
Open Issues: 7
Releases: 147

Topics

bentoml fine-tuning llama llama2 llama3-1 llama3-2 llama3-2-vision llm llm-inference llm-ops llm-serving llmops mistral mlops model-inference open-source-llm openllm vicuna

Created almost 3 years ago · Last pushed 6 months ago

Metadata Files

Readme License Code of conduct Citation Codeowners Security

🦾 OpenLLM: Self-Hosting LLMs Made Easy

[![License: Apache-2.0](https://img.shields.io/badge/License-Apache%202-green.svg)](https://github.com/bentoml/OpenLLM/blob/main/LICENSE) [![Releases](https://img.shields.io/pypi/v/openllm.svg?logo=pypi&label=PyPI&logoColor=gold)](https://pypi.org/project/openllm) [![CI](https://results.pre-commit.ci/badge/github/bentoml/OpenLLM/main.svg)](https://results.pre-commit.ci/latest/github/bentoml/OpenLLM/main) [![X](https://badgen.net/badge/icon/@bentomlai/000000?icon=twitter&label=Follow)](https://twitter.com/bentomlai) [![Community](https://badgen.net/badge/icon/Community/562f5d?icon=slack&label=Join)](https://l.bentoml.com/join-slack)

OpenLLM allows developers to run any open-source LLMs (Llama 3.3, Qwen2.5, Phi3 and more) or custom models as OpenAI-compatible APIs with a single command. It features a built-in chat UI, state-of-the-art inference backends, and a simplified workflow for creating enterprise-grade cloud deployment with Docker, Kubernetes, and BentoCloud.

Understand the design philosophy of OpenLLM.

Get Started

Run the following commands to install OpenLLM and explore it interactively.

bash pip install openllm # or pip3 install openllm openllm hello

hello

Supported models

OpenLLM supports a wide range of state-of-the-art open-source LLMs. You can also add a model repository to run custom models with OpenLLM.

Model	Parameters	Required GPU	Start a Server
deepseek	r1-671b	80Gx16	`openllm serve deepseek:r1-671b`
gemma2	2b	12G	`openllm serve gemma2:2b`
gemma3	3b	12G	`openllm serve gemma3:3b`
jamba1.5	mini-ff0a	80Gx2	`openllm serve jamba1.5:mini-ff0a`
llama3.1	8b	24G	`openllm serve llama3.1:8b`
llama3.2	1b	24G	`openllm serve llama3.2:1b`
llama3.3	70b	80Gx2	`openllm serve llama3.3:70b`
llama4	17b16e	80Gx8	`openllm serve llama4:17b16e`
mistral	8b-2410	24G	`openllm serve mistral:8b-2410`
mistral-large	123b-2407	80Gx4	`openllm serve mistral-large:123b-2407`
phi4	14b	80G	`openllm serve phi4:14b`
pixtral	12b-2409	80G	`openllm serve pixtral:12b-2409`
qwen2.5	7b	24G	`openllm serve qwen2.5:7b`
qwen2.5-coder	3b	24G	`openllm serve qwen2.5-coder:3b`
qwq	32b	80G	`openllm serve qwq:32b`

For the full model list, see the OpenLLM models repository.

Start an LLM server

To start an LLM server locally, use the openllm serve command and specify the model version.

[!NOTE] OpenLLM does not store model weights. A Hugging Face token (HF_TOKEN) is required for gated models.

Create your Hugging Face token here.

Request access to the gated model, such as meta-llama/Llama-3.2-1B-Instruct.

Set your token as an environment variable by running: bash export HF_TOKEN=<your token>

bash openllm serve llama3.2:1b

The server will be accessible at http://localhost:3000, providing OpenAI-compatible APIs for interaction. You can call the endpoints with different frameworks and tools that support OpenAI-compatible APIs. Typically, you may need to specify the following:

The API host address: By default, the LLM is hosted at http://localhost:3000.
The model name: The name can be different depending on the tool you use.
The API key: The API key used for client authentication. This is optional.

Here are some examples:

OpenAI Python client

```python from openai import OpenAI client = OpenAI(base_url='http://localhost:3000/v1', api_key='na') # Use the following func to get the available models # model_list = client.models.list() # print(model_list) chat_completion = client.chat.completions.create( model="meta-llama/Llama-3.2-1B-Instruct", messages=[ { "role": "user", "content": "Explain superconductors like I'm five years old" } ], stream=True, ) for chunk in chat_completion: print(chunk.choices[0].delta.content or "", end="") ```

LlamaIndex

```python from llama_index.llms.openai import OpenAI llm = OpenAI(api_bese="http://localhost:3000/v1", model="meta-llama/Llama-3.2-1B-Instruct", api_key="dummy") ... ```

Chat UI

OpenLLM provides a chat UI at the /chat endpoint for the launched LLM server at http://localhost:3000/chat.

openllm_ui

Chat with a model in the CLI

To start a chat conversation in the CLI, use the openllm run command and specify the model version.

bash openllm run llama3:8b

Model repository

A model repository in OpenLLM represents a catalog of available LLMs that you can run. OpenLLM provides a default model repository that includes the latest open-source LLMs like Llama 3, Mistral, and Qwen2, hosted at this GitHub repository. To see all available models from the default and any added repository, use:

bash openllm model list

To ensure your local list of models is synchronized with the latest updates from all connected repositories, run:

bash openllm repo update

To review a model’s information, run:

bash openllm model get llama3.2:1b

Add a model to the default model repository

You can contribute to the default model repository by adding new models that others can use. This involves creating and submitting a Bento of the LLM. For more information, check out this example pull request.

Set up a custom repository

You can add your own repository to OpenLLM with custom models. To do so, follow the format in the default OpenLLM model repository with a bentos directory to store custom LLMs. You need to build your Bentos with BentoML and submit them to your model repository.

First, prepare your custom models in a bentos directory following the guidelines provided by BentoML to build Bentos. Check out the default model repository for an example and read the Developer Guide for details.

Then, register your custom model repository with OpenLLM:

bash openllm repo add <repo-name> <repo-url>

Note: Currently, OpenLLM only supports adding public repositories.

Deploy to BentoCloud

OpenLLM supports LLM cloud deployment via BentoML, the unified model serving framework, and BentoCloud, an AI inference platform for enterprise AI teams. BentoCloud provides fully-managed infrastructure optimized for LLM inference with autoscaling, model orchestration, observability, and many more, allowing you to run any AI model in the cloud.

Sign up for BentoCloud for free and log in. Then, run openllm deploy to deploy a model to BentoCloud:

bash openllm deploy llama3.2:1b --env HF_TOKEN

[!NOTE] If you are deploying a gated model, make sure to set HF_TOKEN in enviroment variables.

Once the deployment is complete, you can run model inference on the BentoCloud console:

bentocloud_ui

Community

OpenLLM is actively maintained by the BentoML team. Feel free to reach out and join us in our pursuit to make LLMs more accessible and easy to use 👉 Join our Slack community!

Contributing

As an open-source project, we welcome contributions of all kinds, such as new features, bug fixes, and documentation. Here are some of the ways to contribute:

Repost a bug by creating a GitHub issue.
Submit a pull request or help review other developers’ pull requests.
Add an LLM to the OpenLLM default model repository so that other users can run your model. See the pull request template.
Check out the Developer Guide to learn more.

Acknowledgements

This project uses the following open-source projects:

bentoml/bentoml for production level model serving
vllm-project/vllm for production level LLM backend
blrchen/chatgpt-lite for a fancy Web Chat UI
astral-sh/uv for blazing fast model requirements installing

We are grateful to the developers and contributors of these projects for their hard work and dedication.

Owner

Name: BentoML
Login: bentoml
Kind: organization
Location: San Francisco

Website: https://bentoml.com
Twitter: bentomlai
Repositories: 76
Profile: https://github.com/bentoml

The most flexible way to serve AI models in production

Citation (CITATION.cff)

cff-version: 1.2.0
title: 'OpenLLM: Operating LLMs in production'
message: >-
  If you use this software, please cite it using these
  metadata.
type: software
authors:
  - given-names: Aaron
    family-names: Pham
    email: aarnphm@bentoml.com
    orcid: 'https://orcid.org/0009-0008-3180-5115'
  - given-names: Chaoyu
    family-names: Yang
    email: chaoyu@bentoml.com
  - given-names: Sean
    family-names: Sheng
    email: ssheng@bentoml.com
  - given-names: Shenyang
    family-names: Zhao
    email: larme@bentoml.com
  - given-names: Sauyon
    family-names: Lee
    email: sauyon@bentoml.com
  - given-names: Bo
    family-names: Jiang
    email: jiang@bentoml.com
  - given-names: Fog
    family-names: Dong
    email: fog@bentoml.com
  - given-names: Xipeng
    family-names: Guan
    email: xipeng@bentoml.com
  - given-names: Frost
    family-names: Ming
    email: frost@bentoml.com
repository-code: 'https://github.com/bentoml/OpenLLM'
url: 'https://bentoml.com/'
abstract: >-
  OpenLLM is an open platform for operating large language
  models (LLMs) in production. With OpenLLM, you can run
  inference with any open-source large-language models,
  deploy to the cloud or on-premises, and build powerful AI
  apps. It has built-in support for a wide range of
  open-source LLMs and model runtime, including StableLM,
  Falcon, Dolly, Flan-T5, ChatGLM, StarCoder and more.
  OpenLLM helps serve LLMs over RESTful API or gRPC with one
  command or query via WebUI, CLI, our Python/Javascript
  client, or any HTTP client. It provides first-class
  support for LangChain, BentoML and Hugging Face that
  allows you to easily create your own AI apps by composing
  LLMs with other models and services. Last but not least,
  it automatically generates LLM server OCI-compatible
  Container Images or easily deploys as a serverless
  endpoint via BentoCloud.
keywords:
  - MLOps
  - LLMOps
  - LLM
  - Infrastructure
  - Transformers
  - LLM Serving
  - Model Serving
  - Serverless Deployment
license: Apache-2.0
date-released: '2023-06-13'

GitHub Events

Total

Create event: 91
Issues event: 38
Release event: 17
Watch event: 1,640
Delete event: 79
Issue comment event: 40
Push event: 134
Pull request review event: 12
Pull request event: 159
Fork event: 131

Last Year

Create event: 91
Issues event: 38
Release event: 17
Watch event: 1,640
Delete event: 79
Issue comment event: 40
Push event: 134
Pull request review event: 12
Pull request event: 159
Fork event: 131

Committers

Last synced: 6 months ago

All Time

Total Commits: 1,914
Total Committers: 32
Avg Commits per committer: 59.813
Development Distribution Score (DDS): 0.33

Past Year

Commits: 152
Committers: 6
Avg Commits per committer: 25.333
Development Distribution Score (DDS): 0.612

Top Committers

Name	Email	Commits
Aaron Pham	2****m	1,282
dependabot[bot]	4****]	312
bojiang	b**_@o**m	146
pre-commit-ci[bot]	6****]	83
Sherlock Xu	6****3	14
Zhao Shenyang	d**v@z**m	13
Rick Zhou	r**u@g**m	8
Jian Shen	j**2@g**m	8
Chaoyu	p**g@g**m	7
xianxian.zhang	1****l	7
XunchaoZ	6****Z	5
agent	a**t@S**l	3
github-actions[bot]	g****]	2
MingLiangDai	9****i	2
HeTaoPKU	4****d	2
GutZuFusss	l**r@g**m	2
Frost Ming	m**g@g**m	2
Fazli Sapuan	f**i@s**g	2
Abhishek	5****2	1
Alan Poulain	c**t@a**u	1
Dennis Rall	5****l	1
Ikko Eltociear Ashimine	e**r@g**m	1
Kuan-Chun Wang	j**o@g**m	1
Matt Hoffner	m**r@g**m	1
Miorel-Lucian Palii	m**i@g**m	1
RichardScottOZ	7****Z	1
Sauyon Lee	2****n	1
Sean Sheng	s**g@g**m	1
yansheng	6****5	1
weibeu	d**4@g**m	1
and 2 more...

Committer Domains (Top 20 + Academic)

alanpoulain.eu: 1 sapuan.org: 1 zsy.im: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 120
Total pull requests: 372
Average time to close issues: 3 months
Average time to close pull requests: 4 days
Total issue authors: 97
Total pull request authors: 24
Average comments per issue: 1.94
Average comments per pull request: 0.23
Merged pull requests: 280
Bot issues: 1
Bot pull requests: 263

Past Year

Issues: 16
Pull requests: 180
Average time to close issues: about 1 month
Average time to close pull requests: 1 day
Issue authors: 15
Pull request authors: 7
Average comments per issue: 1.56
Average comments per pull request: 0.15
Merged pull requests: 148
Bot issues: 1
Bot pull requests: 154

View more stats

Top Authors

Issue Authors

aarnphm (12)
hahmad2008 (4)
Said-Ikki (2)
andrewjwaggoner (2)
Nanguage (2)
zhangxinyang97 (2)
meanwo (2)
foxxxx001 (2)
yalattas (2)
mfournioux (2)
stoneLee81 (2)
dependabot[bot] (1)
espdev (1)
andysingal (1)
Lightwave234 (1)

Pull Request Authors

dependabot[bot] (242)
pre-commit-ci[bot] (99)
aarnphm (60)
Sherlock113 (16)
larme (8)
bojiang (6)
parano (4)
fuzzie360 (2)
dowithless (2)
Oscarjia (2)
charlod (2)
matthoffner (2)
rickzx (2)
GutZuFusss (1)
mantrakp04 (1)

Top Labels

Issue Labels

performance (2) enhancement (1) dependencies (1) python (1)

Pull Request Labels

dependencies (242) github_actions (127) python (112) documentation (8) javascript (3)

Packages

Total packages: 7
Total downloads:
- pypi 10,427 last-month
- npm 10 last-month
Total docker downloads: 141

Total dependent packages: 7
(may contain duplicates)
Total dependent repositories: 311
(may contain duplicates)
Total versions: 755
Total maintainers: 4

pypi.org: openllm

OpenLLM: Self-hosting LLMs Made Easy.

Homepage: https://bentoml.com
Documentation: https://github.com/bentoml/OpenLLM#readme
License: Apache Software License
Latest release: 0.6.30
published 10 months ago

Versions: 195
Dependent Packages: 3
Dependent Repositories: 295
Downloads: 8,444 Last month
Docker Downloads: 47

Rankings

Dependent repos count: 0.9%

Stargazers count: 1.0%

Downloads: 1.9%

Average: 2.2%

Forks count: 3.0%

Docker downloads count: 3.2%

Dependent packages count: 3.2%

Maintainers (3)

parano ssheng aar0npham

Last synced: 6 months ago

pypi.org: openllm-core

OpenLLM Core: Core components for OpenLLM.

Homepage: https://bentoml.com
Documentation: https://github.com/bentoml/OpenLLM/blob/main/openllm-core/README.md
License: Apache Software License
Latest release: 0.5.7
published over 1 year ago

Versions: 85
Dependent Packages: 2
Dependent Repositories: 8
Downloads: 1,538 Last month
Docker Downloads: 47

Rankings

Downloads: 2.7%

Docker downloads count: 3.2%

Dependent packages count: 3.2%

Average: 3.5%

Dependent repos count: 5.2%

Maintainers (1)

aar0npham

Last synced: 6 months ago

pypi.org: openllm-client

OpenLLM Client: Interacting with OpenLLM HTTP/gRPC server, or any BentoML server.

Homepage: https://bentoml.com
Documentation: https://github.com/bentoml/OpenLLM/blob/main/openllm-client/README.md
License: Apache Software License
Latest release: 0.5.7
published over 1 year ago

Versions: 85
Dependent Packages: 2
Dependent Repositories: 8
Downloads: 445 Last month
Docker Downloads: 47

Rankings

Downloads: 2.7%

Docker downloads count: 3.2%

Average: 3.9%

Dependent packages count: 4.8%

Dependent repos count: 5.2%

Maintainers (1)

aar0npham

Last synced: 6 months ago

proxy.golang.org: github.com/bentoml/openllm

Documentation: https://pkg.go.dev/github.com/bentoml/openllm#section-documentation
License: apache-2.0
Latest release: v0.6.30
published 10 months ago

Versions: 194
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 6.5%

Average: 6.7%

Dependent repos count: 7.0%

Last synced: 6 months ago

proxy.golang.org: github.com/bentoml/OpenLLM

Documentation: https://pkg.go.dev/github.com/bentoml/OpenLLM#section-documentation
License: apache-2.0
Latest release: v0.6.30
published 10 months ago

Versions: 194
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 6.5%

Average: 6.7%

Dependent repos count: 7.0%

Last synced: 6 months ago

npmjs.org: openllm

TS/JS binding for OpenLLM

Homepage: https://github.com/bentoml/OpenLLM#readme
License: Apache 2.0
Latest release: 0.0.2
published over 2 years ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 10 Last month

Rankings

Stargazers count: 1.3%

Forks count: 1.9%

Average: 23.9%

Dependent repos count: 37.5%

Dependent packages count: 54.7%

Maintainers (1)

aarnphm

Last synced: 6 months ago

npmjs.org: openllm_client

TS/JS binding for OpenLLM

Homepage: https://github.com/bentoml/OpenLLM#readme
License: Apache 2.0
Latest release: 0.0.2
published over 2 years ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 0 Last month

Rankings

Stargazers count: 1.3%

Forks count: 1.9%

Average: 23.9%

Dependent repos count: 37.5%

Dependent packages count: 54.7%

Maintainers (1)

aarnphm

Last synced: 6 months ago

openllm

Science Score: 44.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

🦾 OpenLLM: Self-Hosting LLMs Made Easy

Get Started

Supported models

Start an LLM server

Chat UI

Chat with a model in the CLI

Model repository

Add a model to the default model repository

Set up a custom repository

Deploy to BentoCloud

Community

Contributing

Acknowledgements

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: openllm

Rankings

Maintainers (3)

pypi.org: openllm-core

Rankings

Maintainers (1)

pypi.org: openllm-client

Rankings

Maintainers (1)

proxy.golang.org: github.com/bentoml/openllm

Rankings

proxy.golang.org: github.com/bentoml/OpenLLM

Rankings

npmjs.org: openllm

Rankings

Maintainers (1)

npmjs.org: openllm_client

Rankings

Maintainers (1)