https://github.com/airockchip/rknn-llm
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.2%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: airockchip
- License: other
- Language: Python
- Default Branch: main
- Size: 245 MB
Statistics
- Stars: 844
- Watchers: 26
- Forks: 101
- Open Issues: 176
- Releases: 8
Metadata Files
README.md
Description
RKLLM software stack can help users to quickly deploy AI models to Rockchip chips. The overall framework is as follows:
In order to use RKNPU, users need to first run the RKLLM-Toolkit tool on the computer, convert the trained model into an RKLLM format model, and then inference on the development board using the RKLLM C API.
RKLLM-Toolkit is a software development kit for users to perform model conversionand quantization on PC.
RKLLM Runtime provides C/C++ programming interfaces for Rockchip NPU platform to help users deploy RKLLM models and accelerate the implementation of LLM applications.
RKNPU kernel driver is responsible for interacting with NPU hardware. It has been open source and can be found in the Rockchip kernel code.
Support Platform
- RK3588 Series
- RK3576 Series
- RK3562 Series
- RV1126B Series
Support Models
- [x] LLAMA models
- [x] TinyLLAMA models
- [x] Qwen2/Qwen2.5/Qwen3
- [x] Phi2/Phi3
- [x] ChatGLM3-6B
- [x] Gemma2/Gemma3
- [x] InternLM2 models
- [x] MiniCPM3/MiniCPM4
- [x] TeleChat2
- [x] Qwen2-VL-2B-Instruct/Qwen2-VL-7B-Instruct/Qwen2.5-VL-3B-Instruct
- [x] MiniCPM-V-2_6
- [x] DeepSeek-R1-Distill
- [x] Janus-Pro-1B
- [x] InternVL2-1B
- [x] SmolVLM
- [x] RWKV7
Model Performance
- Benchmark results of common LLMs.
Performance Testing Methods
- Run the frequency-setting script from the
scriptsdirectory on the target platform. - Execute
export RKLLM_LOG_LEVEL=1on the device to log model inference performance and memory usage. - Use the
eval_perf_watch_cpu.shscript to measure CPU utilization. - Use the
eval_perf_watch_npu.shscript to measure NPU utilization.
Download
- You can download the latest package from RKLLM_SDK, fetch code: rkllm
- You can download the converted rkllm model from rkllmmodelzoo, fetch code: rkllm
Examples
- Multimodel deployment demo: Qwen2-VL_Demo
- API usage demo: DeepSeek-R1-Distill-Qwen-1.5B_Demo
- API server demo: rkllmserverdemo
- MultimodalInteractiveDialogue_Demo MultimodalInteractiveDialogue_Demo
Note
The supported Python versions are:
- Python 3.8
- Python 3.9
- Python 3.10
- Python 3.11
- Python 3.12
Note: Before installing package in a Python 3.12 environment, please run the command:
export BUILD_CUDA_EXT=0
- On some platforms, you may encounter an error indicating that libomp.so cannot be found. To resolve this, locate the library in the corresponding cross-compilation toolchain and place it in the board's lib directory, at the same level as librkllmrt.so.
- RWKV model conversion only supports Python 3.12. Please use requirements_rwkv7.txt to set up the pip environment.
- Latest version: v1.2.1
RKNN Toolkit2
If you want to deploy additional AI model, we have introduced a SDK called RKNN-Toolkit2. For details, please refer to:
https://github.com/airockchip/rknn-toolkit2
CHANGELOG
v1.2.1
- Added support for RWKV7, Qwen3, and MiniCPM4 models
- Added support for the RV1126B platform
- Enabled function calling capability
- Enabled cross-attention inference
- Optimize the callback function to support pausing inference
- Supported multi-batch inference
- Optimized KV cache clearing interface
- Improved chat template parsing with support for thinking mode selection
- Server demo updated to support OpenAI-compatible format
- Added return of model inference performance statistics
- Supported mrope multimodal position encoding
- A new quantization optimization algorithm has been added to improve quantization accuracy
for older version, please refer CHANGELOG
Owner
- Login: airockchip
- Kind: user
- Repositories: 4
- Profile: https://github.com/airockchip
GitHub Events
Total
- Create event: 8
- Issues event: 294
- Release event: 7
- Watch event: 512
- Delete event: 2
- Issue comment event: 742
- Push event: 12
- Pull request event: 4
- Pull request review event: 1
- Fork event: 62
Last Year
- Create event: 8
- Issues event: 294
- Release event: 7
- Watch event: 512
- Delete event: 2
- Issue comment event: 742
- Push event: 12
- Pull request event: 4
- Pull request review event: 1
- Fork event: 62
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 292
- Total pull requests: 14
- Average time to close issues: 20 days
- Average time to close pull requests: 9 days
- Total issue authors: 195
- Total pull request authors: 14
- Average comments per issue: 1.01
- Average comments per pull request: 0.93
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 216
- Pull requests: 7
- Average time to close issues: 10 days
- Average time to close pull requests: 12 days
- Issue authors: 147
- Pull request authors: 7
- Average comments per issue: 0.99
- Average comments per pull request: 1.14
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- wohaiaini (8)
- happyme531 (7)
- fydeos-alex (7)
- ysh329 (7)
- openedev (5)
- Tang-JingWei (5)
- danwahe (4)
- lzjie-tchip (4)
- Gooddz1 (4)
- 17656178609 (4)
- c0zaut (3)
- vincenzodentamaro (3)
- skiptomylou86 (3)
- lzw12138 (3)
- zhangnn520 (3)
Pull Request Authors
- wishday (1)
- 80Builder80 (1)
- yuguolong (1)
- cryi (1)
- shaqing (1)
- tolidano (1)
- keeper-jie (1)
- AACengineer (1)
- wingceltis-c (1)
- PlanetesDDH (1)
- Pelochus (1)
- vincenzodentamaro (1)
- mtwlz (1)
- huaxin233 (1)