openllm

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

https://github.com/bentoml/openllm

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary

Keywords

bentoml fine-tuning llama llama2 llama3-1 llama3-2 llama3-2-vision llm llm-inference llm-ops llm-serving llmops mistral mlops model-inference open-source-llm openllm vicuna

Keywords from Contributors

optimizer energy-system transformers agents mesh cryptocurrencies geoscience spacy-extension sequencers animations
Last synced: 6 months ago · JSON representation ·

Repository

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

Basic Info
  • Host: GitHub
  • Owner: bentoml
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage: https://bentoml.com
  • Size: 41.1 MB
Statistics
  • Stars: 11,738
  • Watchers: 60
  • Forks: 763
  • Open Issues: 7
  • Releases: 147
Topics
bentoml fine-tuning llama llama2 llama3-1 llama3-2 llama3-2-vision llm llm-inference llm-ops llm-serving llmops mistral mlops model-inference open-source-llm openllm vicuna
Created almost 3 years ago · Last pushed 6 months ago
Metadata Files
Readme License Code of conduct Citation Codeowners Security

README.md

🦾 OpenLLM: Self-Hosting LLMs Made Easy

[![License: Apache-2.0](https://img.shields.io/badge/License-Apache%202-green.svg)](https://github.com/bentoml/OpenLLM/blob/main/LICENSE) [![Releases](https://img.shields.io/pypi/v/openllm.svg?logo=pypi&label=PyPI&logoColor=gold)](https://pypi.org/project/openllm) [![CI](https://results.pre-commit.ci/badge/github/bentoml/OpenLLM/main.svg)](https://results.pre-commit.ci/latest/github/bentoml/OpenLLM/main) [![X](https://badgen.net/badge/icon/@bentomlai/000000?icon=twitter&label=Follow)](https://twitter.com/bentomlai) [![Community](https://badgen.net/badge/icon/Community/562f5d?icon=slack&label=Join)](https://l.bentoml.com/join-slack)

OpenLLM allows developers to run any open-source LLMs (Llama 3.3, Qwen2.5, Phi3 and more) or custom models as OpenAI-compatible APIs with a single command. It features a built-in chat UI, state-of-the-art inference backends, and a simplified workflow for creating enterprise-grade cloud deployment with Docker, Kubernetes, and BentoCloud.

Understand the design philosophy of OpenLLM.

Get Started

Run the following commands to install OpenLLM and explore it interactively.

bash pip install openllm # or pip3 install openllm openllm hello

hello

Supported models

OpenLLM supports a wide range of state-of-the-art open-source LLMs. You can also add a model repository to run custom models with OpenLLM.

Model Parameters Required GPU Start a Server
deepseek r1-671b 80Gx16 openllm serve deepseek:r1-671b
gemma2 2b 12G openllm serve gemma2:2b
gemma3 3b 12G openllm serve gemma3:3b
jamba1.5 mini-ff0a 80Gx2 openllm serve jamba1.5:mini-ff0a
llama3.1 8b 24G openllm serve llama3.1:8b
llama3.2 1b 24G openllm serve llama3.2:1b
llama3.3 70b 80Gx2 openllm serve llama3.3:70b
llama4 17b16e 80Gx8 openllm serve llama4:17b16e
mistral 8b-2410 24G openllm serve mistral:8b-2410
mistral-large 123b-2407 80Gx4 openllm serve mistral-large:123b-2407
phi4 14b 80G openllm serve phi4:14b
pixtral 12b-2409 80G openllm serve pixtral:12b-2409
qwen2.5 7b 24G openllm serve qwen2.5:7b
qwen2.5-coder 3b 24G openllm serve qwen2.5-coder:3b
qwq 32b 80G openllm serve qwq:32b

For the full model list, see the OpenLLM models repository.

Start an LLM server

To start an LLM server locally, use the openllm serve command and specify the model version.

[!NOTE] OpenLLM does not store model weights. A Hugging Face token (HF_TOKEN) is required for gated models.

  1. Create your Hugging Face token here.
  2. Request access to the gated model, such as meta-llama/Llama-3.2-1B-Instruct.
  3. Set your token as an environment variable by running: bash export HF_TOKEN=<your token>

bash openllm serve llama3.2:1b

The server will be accessible at http://localhost:3000, providing OpenAI-compatible APIs for interaction. You can call the endpoints with different frameworks and tools that support OpenAI-compatible APIs. Typically, you may need to specify the following:

  • The API host address: By default, the LLM is hosted at http://localhost:3000.
  • The model name: The name can be different depending on the tool you use.
  • The API key: The API key used for client authentication. This is optional.

Here are some examples:

OpenAI Python client ```python from openai import OpenAI client = OpenAI(base_url='http://localhost:3000/v1', api_key='na') # Use the following func to get the available models # model_list = client.models.list() # print(model_list) chat_completion = client.chat.completions.create( model="meta-llama/Llama-3.2-1B-Instruct", messages=[ { "role": "user", "content": "Explain superconductors like I'm five years old" } ], stream=True, ) for chunk in chat_completion: print(chunk.choices[0].delta.content or "", end="") ```
LlamaIndex ```python from llama_index.llms.openai import OpenAI llm = OpenAI(api_bese="http://localhost:3000/v1", model="meta-llama/Llama-3.2-1B-Instruct", api_key="dummy") ... ```

Chat UI

OpenLLM provides a chat UI at the /chat endpoint for the launched LLM server at http://localhost:3000/chat.

openllm_ui

Chat with a model in the CLI

To start a chat conversation in the CLI, use the openllm run command and specify the model version.

bash openllm run llama3:8b

Model repository

A model repository in OpenLLM represents a catalog of available LLMs that you can run. OpenLLM provides a default model repository that includes the latest open-source LLMs like Llama 3, Mistral, and Qwen2, hosted at this GitHub repository. To see all available models from the default and any added repository, use:

bash openllm model list

To ensure your local list of models is synchronized with the latest updates from all connected repositories, run:

bash openllm repo update

To review a model’s information, run:

bash openllm model get llama3.2:1b

Add a model to the default model repository

You can contribute to the default model repository by adding new models that others can use. This involves creating and submitting a Bento of the LLM. For more information, check out this example pull request.

Set up a custom repository

You can add your own repository to OpenLLM with custom models. To do so, follow the format in the default OpenLLM model repository with a bentos directory to store custom LLMs. You need to build your Bentos with BentoML and submit them to your model repository.

First, prepare your custom models in a bentos directory following the guidelines provided by BentoML to build Bentos. Check out the default model repository for an example and read the Developer Guide for details.

Then, register your custom model repository with OpenLLM:

bash openllm repo add <repo-name> <repo-url>

Note: Currently, OpenLLM only supports adding public repositories.

Deploy to BentoCloud

OpenLLM supports LLM cloud deployment via BentoML, the unified model serving framework, and BentoCloud, an AI inference platform for enterprise AI teams. BentoCloud provides fully-managed infrastructure optimized for LLM inference with autoscaling, model orchestration, observability, and many more, allowing you to run any AI model in the cloud.

Sign up for BentoCloud for free and log in. Then, run openllm deploy to deploy a model to BentoCloud:

bash openllm deploy llama3.2:1b --env HF_TOKEN

[!NOTE] If you are deploying a gated model, make sure to set HF_TOKEN in enviroment variables.

Once the deployment is complete, you can run model inference on the BentoCloud console:

bentocloud_ui

Community

OpenLLM is actively maintained by the BentoML team. Feel free to reach out and join us in our pursuit to make LLMs more accessible and easy to use 👉 Join our Slack community!

Contributing

As an open-source project, we welcome contributions of all kinds, such as new features, bug fixes, and documentation. Here are some of the ways to contribute:

Acknowledgements

This project uses the following open-source projects:

We are grateful to the developers and contributors of these projects for their hard work and dedication.

Owner

  • Name: BentoML
  • Login: bentoml
  • Kind: organization
  • Location: San Francisco

The most flexible way to serve AI models in production

Citation (CITATION.cff)

cff-version: 1.2.0
title: 'OpenLLM: Operating LLMs in production'
message: >-
  If you use this software, please cite it using these
  metadata.
type: software
authors:
  - given-names: Aaron
    family-names: Pham
    email: aarnphm@bentoml.com
    orcid: 'https://orcid.org/0009-0008-3180-5115'
  - given-names: Chaoyu
    family-names: Yang
    email: chaoyu@bentoml.com
  - given-names: Sean
    family-names: Sheng
    email: ssheng@bentoml.com
  - given-names: Shenyang
    family-names: Zhao
    email: larme@bentoml.com
  - given-names: Sauyon
    family-names: Lee
    email: sauyon@bentoml.com
  - given-names: Bo
    family-names: Jiang
    email: jiang@bentoml.com
  - given-names: Fog
    family-names: Dong
    email: fog@bentoml.com
  - given-names: Xipeng
    family-names: Guan
    email: xipeng@bentoml.com
  - given-names: Frost
    family-names: Ming
    email: frost@bentoml.com
repository-code: 'https://github.com/bentoml/OpenLLM'
url: 'https://bentoml.com/'
abstract: >-
  OpenLLM is an open platform for operating large language
  models (LLMs) in production. With OpenLLM, you can run
  inference with any open-source large-language models,
  deploy to the cloud or on-premises, and build powerful AI
  apps. It has built-in support for a wide range of
  open-source LLMs and model runtime, including StableLM,
  Falcon, Dolly, Flan-T5, ChatGLM, StarCoder and more.
  OpenLLM helps serve LLMs over RESTful API or gRPC with one
  command or query via WebUI, CLI, our Python/Javascript
  client, or any HTTP client. It provides first-class
  support for LangChain, BentoML and Hugging Face that
  allows you to easily create your own AI apps by composing
  LLMs with other models and services. Last but not least,
  it automatically generates LLM server OCI-compatible
  Container Images or easily deploys as a serverless
  endpoint via BentoCloud.
keywords:
  - MLOps
  - LLMOps
  - LLM
  - Infrastructure
  - Transformers
  - LLM Serving
  - Model Serving
  - Serverless Deployment
license: Apache-2.0
date-released: '2023-06-13'

GitHub Events

Total
  • Create event: 91
  • Issues event: 38
  • Release event: 17
  • Watch event: 1,640
  • Delete event: 79
  • Issue comment event: 40
  • Push event: 134
  • Pull request review event: 12
  • Pull request event: 159
  • Fork event: 131
Last Year
  • Create event: 91
  • Issues event: 38
  • Release event: 17
  • Watch event: 1,640
  • Delete event: 79
  • Issue comment event: 40
  • Push event: 134
  • Pull request review event: 12
  • Pull request event: 159
  • Fork event: 131

Committers

Last synced: 6 months ago

All Time
  • Total Commits: 1,914
  • Total Committers: 32
  • Avg Commits per committer: 59.813
  • Development Distribution Score (DDS): 0.33
Past Year
  • Commits: 152
  • Committers: 6
  • Avg Commits per committer: 25.333
  • Development Distribution Score (DDS): 0.612
Top Committers
Name Email Commits
Aaron Pham 2****m 1,282
dependabot[bot] 4****] 312
bojiang b****_@o****m 146
pre-commit-ci[bot] 6****] 83
Sherlock Xu 6****3 14
Zhao Shenyang d****v@z****m 13
Rick Zhou r****u@g****m 8
Jian Shen j****2@g****m 8
Chaoyu p****g@g****m 7
xianxian.zhang 1****l 7
XunchaoZ 6****Z 5
agent a****t@S****l 3
github-actions[bot] g****] 2
MingLiangDai 9****i 2
HeTaoPKU 4****d 2
GutZuFusss l****r@g****m 2
Frost Ming m****g@g****m 2
Fazli Sapuan f****i@s****g 2
Abhishek 5****2 1
Alan Poulain c****t@a****u 1
Dennis Rall 5****l 1
Ikko Eltociear Ashimine e****r@g****m 1
Kuan-Chun Wang j****o@g****m 1
Matt Hoffner m****r@g****m 1
Miorel-Lucian Palii m****i@g****m 1
RichardScottOZ 7****Z 1
Sauyon Lee 2****n 1
Sean Sheng s****g@g****m 1
yansheng 6****5 1
weibeu d****4@g****m 1
and 2 more...
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 120
  • Total pull requests: 372
  • Average time to close issues: 3 months
  • Average time to close pull requests: 4 days
  • Total issue authors: 97
  • Total pull request authors: 24
  • Average comments per issue: 1.94
  • Average comments per pull request: 0.23
  • Merged pull requests: 280
  • Bot issues: 1
  • Bot pull requests: 263
Past Year
  • Issues: 16
  • Pull requests: 180
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 1 day
  • Issue authors: 15
  • Pull request authors: 7
  • Average comments per issue: 1.56
  • Average comments per pull request: 0.15
  • Merged pull requests: 148
  • Bot issues: 1
  • Bot pull requests: 154
Top Authors
Issue Authors
  • aarnphm (12)
  • hahmad2008 (4)
  • Said-Ikki (2)
  • andrewjwaggoner (2)
  • Nanguage (2)
  • zhangxinyang97 (2)
  • meanwo (2)
  • foxxxx001 (2)
  • yalattas (2)
  • mfournioux (2)
  • stoneLee81 (2)
  • dependabot[bot] (1)
  • espdev (1)
  • andysingal (1)
  • Lightwave234 (1)
Pull Request Authors
  • dependabot[bot] (242)
  • pre-commit-ci[bot] (99)
  • aarnphm (60)
  • Sherlock113 (16)
  • larme (8)
  • bojiang (6)
  • parano (4)
  • fuzzie360 (2)
  • dowithless (2)
  • Oscarjia (2)
  • charlod (2)
  • matthoffner (2)
  • rickzx (2)
  • GutZuFusss (1)
  • mantrakp04 (1)
Top Labels
Issue Labels
performance (2) enhancement (1) dependencies (1) python (1)
Pull Request Labels
dependencies (242) github_actions (127) python (112) documentation (8) javascript (3)

Packages

  • Total packages: 7
  • Total downloads:
    • pypi 10,427 last-month
    • npm 10 last-month
  • Total docker downloads: 141
  • Total dependent packages: 7
    (may contain duplicates)
  • Total dependent repositories: 311
    (may contain duplicates)
  • Total versions: 755
  • Total maintainers: 4
pypi.org: openllm

OpenLLM: Self-hosting LLMs Made Easy.

  • Versions: 195
  • Dependent Packages: 3
  • Dependent Repositories: 295
  • Downloads: 8,444 Last month
  • Docker Downloads: 47
Rankings
Dependent repos count: 0.9%
Stargazers count: 1.0%
Downloads: 1.9%
Average: 2.2%
Forks count: 3.0%
Docker downloads count: 3.2%
Dependent packages count: 3.2%
Maintainers (3)
Last synced: 6 months ago
pypi.org: openllm-core

OpenLLM Core: Core components for OpenLLM.

  • Versions: 85
  • Dependent Packages: 2
  • Dependent Repositories: 8
  • Downloads: 1,538 Last month
  • Docker Downloads: 47
Rankings
Downloads: 2.7%
Docker downloads count: 3.2%
Dependent packages count: 3.2%
Average: 3.5%
Dependent repos count: 5.2%
Maintainers (1)
Last synced: 6 months ago
pypi.org: openllm-client

OpenLLM Client: Interacting with OpenLLM HTTP/gRPC server, or any BentoML server.

  • Versions: 85
  • Dependent Packages: 2
  • Dependent Repositories: 8
  • Downloads: 445 Last month
  • Docker Downloads: 47
Rankings
Downloads: 2.7%
Docker downloads count: 3.2%
Average: 3.9%
Dependent packages count: 4.8%
Dependent repos count: 5.2%
Maintainers (1)
Last synced: 6 months ago
proxy.golang.org: github.com/bentoml/openllm
  • Versions: 194
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.5%
Average: 6.7%
Dependent repos count: 7.0%
Last synced: 6 months ago
proxy.golang.org: github.com/bentoml/OpenLLM
  • Versions: 194
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.5%
Average: 6.7%
Dependent repos count: 7.0%
Last synced: 6 months ago
npmjs.org: openllm

TS/JS binding for OpenLLM

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 10 Last month
Rankings
Stargazers count: 1.3%
Forks count: 1.9%
Average: 23.9%
Dependent repos count: 37.5%
Dependent packages count: 54.7%
Maintainers (1)
Last synced: 6 months ago
npmjs.org: openllm_client

TS/JS binding for OpenLLM

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 0 Last month
Rankings
Stargazers count: 1.3%
Forks count: 1.9%
Average: 23.9%
Dependent repos count: 37.5%
Dependent packages count: 54.7%
Maintainers (1)
Last synced: 6 months ago