https://github.com/airockchip/rknn-llm

https://github.com/airockchip/rknn-llm

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.2%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: airockchip
  • License: other
  • Language: Python
  • Default Branch: main
  • Size: 245 MB
Statistics
  • Stars: 844
  • Watchers: 26
  • Forks: 101
  • Open Issues: 176
  • Releases: 8
Created over 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog License

README.md

Description

RKLLM software stack can help users to quickly deploy AI models to Rockchip chips. The overall framework is as follows:

In order to use RKNPU, users need to first run the RKLLM-Toolkit tool on the computer, convert the trained model into an RKLLM format model, and then inference on the development board using the RKLLM C API.

  • RKLLM-Toolkit is a software development kit for users to perform model conversionand quantization on PC.

  • RKLLM Runtime provides C/C++ programming interfaces for Rockchip NPU platform to help users deploy RKLLM models and accelerate the implementation of LLM applications.

  • RKNPU kernel driver is responsible for interacting with NPU hardware. It has been open source and can be found in the Rockchip kernel code.

Support Platform

  • RK3588 Series
  • RK3576 Series
  • RK3562 Series
  • RV1126B Series

Support Models

Model Performance

  1. Benchmark results of common LLMs.

Performance Testing Methods

  1. Run the frequency-setting script from the scripts directory on the target platform.
  2. Execute export RKLLM_LOG_LEVEL=1 on the device to log model inference performance and memory usage.
  3. Use the eval_perf_watch_cpu.sh script to measure CPU utilization.
  4. Use the eval_perf_watch_npu.sh script to measure NPU utilization.

Download

  1. You can download the latest package from RKLLM_SDK, fetch code: rkllm
  2. You can download the converted rkllm model from rkllmmodelzoo, fetch code: rkllm

Examples

  1. Multimodel deployment demo: Qwen2-VL_Demo
  2. API usage demo: DeepSeek-R1-Distill-Qwen-1.5B_Demo
  3. API server demo: rkllmserverdemo
  4. MultimodalInteractiveDialogue_Demo MultimodalInteractiveDialogue_Demo

Note

  • The supported Python versions are:

    • Python 3.8
    • Python 3.9
    • Python 3.10
    • Python 3.11
    • Python 3.12

Note: Before installing package in a Python 3.12 environment, please run the command:

export BUILD_CUDA_EXT=0 - On some platforms, you may encounter an error indicating that libomp.so cannot be found. To resolve this, locate the library in the corresponding cross-compilation toolchain and place it in the board's lib directory, at the same level as librkllmrt.so. - RWKV model conversion only supports Python 3.12. Please use requirements_rwkv7.txt to set up the pip environment. - Latest version: v1.2.1

RKNN Toolkit2

If you want to deploy additional AI model, we have introduced a SDK called RKNN-Toolkit2. For details, please refer to:

https://github.com/airockchip/rknn-toolkit2

CHANGELOG

v1.2.1

  • Added support for RWKV7, Qwen3, and MiniCPM4 models
  • Added support for the RV1126B platform
  • Enabled function calling capability
  • Enabled cross-attention inference
  • Optimize the callback function to support pausing inference
  • Supported multi-batch inference
  • Optimized KV cache clearing interface
  • Improved chat template parsing with support for thinking mode selection
  • Server demo updated to support OpenAI-compatible format
  • Added return of model inference performance statistics
  • Supported mrope multimodal position encoding
  • A new quantization optimization algorithm has been added to improve quantization accuracy

for older version, please refer CHANGELOG

Owner

  • Login: airockchip
  • Kind: user

GitHub Events

Total
  • Create event: 8
  • Issues event: 294
  • Release event: 7
  • Watch event: 512
  • Delete event: 2
  • Issue comment event: 742
  • Push event: 12
  • Pull request event: 4
  • Pull request review event: 1
  • Fork event: 62
Last Year
  • Create event: 8
  • Issues event: 294
  • Release event: 7
  • Watch event: 512
  • Delete event: 2
  • Issue comment event: 742
  • Push event: 12
  • Pull request event: 4
  • Pull request review event: 1
  • Fork event: 62

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 292
  • Total pull requests: 14
  • Average time to close issues: 20 days
  • Average time to close pull requests: 9 days
  • Total issue authors: 195
  • Total pull request authors: 14
  • Average comments per issue: 1.01
  • Average comments per pull request: 0.93
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 216
  • Pull requests: 7
  • Average time to close issues: 10 days
  • Average time to close pull requests: 12 days
  • Issue authors: 147
  • Pull request authors: 7
  • Average comments per issue: 0.99
  • Average comments per pull request: 1.14
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • wohaiaini (8)
  • happyme531 (7)
  • fydeos-alex (7)
  • ysh329 (7)
  • openedev (5)
  • Tang-JingWei (5)
  • danwahe (4)
  • lzjie-tchip (4)
  • Gooddz1 (4)
  • 17656178609 (4)
  • c0zaut (3)
  • vincenzodentamaro (3)
  • skiptomylou86 (3)
  • lzw12138 (3)
  • zhangnn520 (3)
Pull Request Authors
  • wishday (1)
  • 80Builder80 (1)
  • yuguolong (1)
  • cryi (1)
  • shaqing (1)
  • tolidano (1)
  • keeper-jie (1)
  • AACengineer (1)
  • wingceltis-c (1)
  • PlanetesDDH (1)
  • Pelochus (1)
  • vincenzodentamaro (1)
  • mtwlz (1)
  • huaxin233 (1)
Top Labels
Issue Labels
Pull Request Labels