https://github.com/airockchip/rknn-llm

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: airockchip
License: other
Language: Python
Default Branch: main
Size: 245 MB

Statistics

Stars: 844
Watchers: 26
Forks: 101
Open Issues: 176
Releases: 8

Created over 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog License

Description

RKLLM software stack can help users to quickly deploy AI models to Rockchip chips. The overall framework is as follows:

In order to use RKNPU, users need to first run the RKLLM-Toolkit tool on the computer, convert the trained model into an RKLLM format model, and then inference on the development board using the RKLLM C API.

RKLLM-Toolkit is a software development kit for users to perform model conversionand quantization on PC.
RKLLM Runtime provides C/C++ programming interfaces for Rockchip NPU platform to help users deploy RKLLM models and accelerate the implementation of LLM applications.
RKNPU kernel driver is responsible for interacting with NPU hardware. It has been open source and can be found in the Rockchip kernel code.

Support Platform

RK3588 Series
RK3576 Series
RK3562 Series
RV1126B Series

Support Models

[x] LLAMA models
[x] TinyLLAMA models
[x] Qwen2/Qwen2.5/Qwen3
[x] Phi2/Phi3
[x] ChatGLM3-6B
[x] Gemma2/Gemma3
[x] InternLM2 models
[x] MiniCPM3/MiniCPM4
[x] TeleChat2
[x] Qwen2-VL-2B-Instruct/Qwen2-VL-7B-Instruct/Qwen2.5-VL-3B-Instruct
[x] MiniCPM-V-2_6
[x] DeepSeek-R1-Distill
[x] Janus-Pro-1B
[x] InternVL2-1B
[x] SmolVLM
[x] RWKV7

Model Performance

Benchmark results of common LLMs.

Performance Testing Methods

Run the frequency-setting script from the scripts directory on the target platform.
Execute export RKLLM_LOG_LEVEL=1 on the device to log model inference performance and memory usage.
Use the eval_perf_watch_cpu.sh script to measure CPU utilization.
Use the eval_perf_watch_npu.sh script to measure NPU utilization.

Download

You can download the latest package from RKLLM_SDK, fetch code: rkllm
You can download the converted rkllm model from rkllmmodelzoo, fetch code: rkllm

Examples

Multimodel deployment demo: Qwen2-VL_Demo
API usage demo: DeepSeek-R1-Distill-Qwen-1.5B_Demo
API server demo: rkllmserverdemo
MultimodalInteractiveDialogue_Demo MultimodalInteractiveDialogue_Demo

Note

The supported Python versions are:
- Python 3.8
- Python 3.9
- Python 3.10
- Python 3.11
- Python 3.12

Note: Before installing package in a Python 3.12 environment, please run the command:

export BUILD_CUDA_EXT=0 - On some platforms, you may encounter an error indicating that libomp.so cannot be found. To resolve this, locate the library in the corresponding cross-compilation toolchain and place it in the board's lib directory, at the same level as librkllmrt.so. - RWKV model conversion only supports Python 3.12. Please use requirements_rwkv7.txt to set up the pip environment. - Latest version: v1.2.1

RKNN Toolkit2

If you want to deploy additional AI model, we have introduced a SDK called RKNN-Toolkit2. For details, please refer to:

https://github.com/airockchip/rknn-toolkit2

CHANGELOG

v1.2.1

Added support for RWKV7, Qwen3, and MiniCPM4 models
Added support for the RV1126B platform
Enabled function calling capability
Enabled cross-attention inference
Optimize the callback function to support pausing inference
Supported multi-batch inference
Optimized KV cache clearing interface
Improved chat template parsing with support for thinking mode selection
Server demo updated to support OpenAI-compatible format
Added return of model inference performance statistics
Supported mrope multimodal position encoding
A new quantization optimization algorithm has been added to improve quantization accuracy

for older version, please refer CHANGELOG

Owner

Login: airockchip
Kind: user

Repositories: 4
Profile: https://github.com/airockchip

GitHub Events

Total

Create event: 8
Issues event: 294
Release event: 7
Watch event: 512
Delete event: 2
Issue comment event: 742
Push event: 12
Pull request event: 4
Pull request review event: 1
Fork event: 62

Last Year

Create event: 8
Issues event: 294
Release event: 7
Watch event: 512
Delete event: 2
Issue comment event: 742
Push event: 12
Pull request event: 4
Pull request review event: 1
Fork event: 62

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 292
Total pull requests: 14
Average time to close issues: 20 days
Average time to close pull requests: 9 days
Total issue authors: 195
Total pull request authors: 14
Average comments per issue: 1.01
Average comments per pull request: 0.93
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 216
Pull requests: 7
Average time to close issues: 10 days
Average time to close pull requests: 12 days
Issue authors: 147
Pull request authors: 7
Average comments per issue: 0.99
Average comments per pull request: 1.14
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

wohaiaini (8)
happyme531 (7)
fydeos-alex (7)
ysh329 (7)
openedev (5)
Tang-JingWei (5)
danwahe (4)
lzjie-tchip (4)
Gooddz1 (4)
17656178609 (4)
c0zaut (3)
vincenzodentamaro (3)
skiptomylou86 (3)
lzw12138 (3)
zhangnn520 (3)

Pull Request Authors

wishday (1)
80Builder80 (1)
yuguolong (1)
cryi (1)
shaqing (1)
tolidano (1)
keeper-jie (1)
AACengineer (1)
wingceltis-c (1)
PlanetesDDH (1)
Pelochus (1)
vincenzodentamaro (1)
mtwlz (1)
huaxin233 (1)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science