Recent Releases of https://github.com/airockchip/rknn-llm

https://github.com/airockchip/rknn-llm - release-v1.2.1

  • Added support for RWKV7, Qwen3, and MiniCPM4 models
  • Added support for the RV1126B platform
  • Enabled function calling capability
  • Enabled cross-attention inference
  • Optimize the callback function to support pausing inference
  • Supported multi-batch inference
  • Optimized KV cache clearing interface
  • Improved chat template parsing with support for thinking mode selection
  • Server demo updated to support OpenAI-compatible format
  • Added return of model inference performance statistics
  • Supported mrope multimodal position encoding
  • A new quantization optimization algorithm has been added to improve quantization accuracy

- Python
Published by yhcvb about 1 year ago

https://github.com/airockchip/rknn-llm - release-v1.2.1b1

  • Added support for Qwen3

- Python
Published by yhcvb about 1 year ago

https://github.com/airockchip/rknn-llm - release-v1.2.0

  • Supports custom model conversion.
  • Supports chat_template configuration.
  • Enables multi-turn dialogue interactions.
  • Implements automatic prompt cache reuse for improved inference efficiency.
  • Expands maximum context length to 16K.
  • Supports embedding flash storage to reduce memory usage.
  • Introduces the GRQ Int4 quantization algorithm.
  • Supports GPTQ-Int8 model conversion.
  • Compatible with the RK3562 platform.
  • Added support for visual multimodal models such as InternVL2, Janus, and Qwen2.5-VL.
  • Supports CPU core configuration.
  • Added support for Gemma3
  • Added support for Python 3.9/3.11/3.12

- Python
Published by yhcvb about 1 year ago

https://github.com/airockchip/rknn-llm - release-v1.1.4

  • Add support for converting HuggingFace GPTQ-int4 models (requires groupsize to be 32, 64, or 128, and desc_act set to false).
  • Add support for TeleChat/TeleChat2/MiniCPM-S models.
  • Support exporting llm model in Qwen2VL
  • Resolve issues with LoRA inference.
  • Fix an import error related to IPython.

- Python
Published by yhcvb over 1 year ago

https://github.com/airockchip/rknn-llm - release-v1.1.2

  • Fix inference error in chatglm3 model
  • Fix inference issue with embedding input
  • Support exporting llm model in MiniCPMV

- Python
Published by yhcvb over 1 year ago

https://github.com/airockchip/rknn-llm - release-v1.1.1

  • Fixed the inference error in the minicpm3 mode
  • Fixed the runtime error in rkllmserverdemo.
  • Added the rkllm-toolkit installation package for Python 3.10.
  • Supported gguf model conversion when tiewordembeddings is set to true.

- Python
Published by yhcvb over 1 year ago

https://github.com/airockchip/rknn-llm - release-v1.1.0

  • Added support for grouped quantization (w4a16 group sizes of 32/64/128, w8a8 group sizes of 128/256/512).
  • Added gdq algorithm to improve 4-bit quantization accuracy.
  • Added hybrid quantization algorithm, supporting a combination of grouped and non-grouped quantization based on specified ratios.
  • Added support for Llama3, Gemma2, and Minicpm3 models.
  • Added support for gguf model conversion (currently supports q4_0 and fp16 only).
  • Added support for LoRa models.
  • Added storage and loading of prompt cache
  • Added PC-side emulation accuracy testing and inference interface support for rkllm-toolkit.
  • Fixed catastrophic forgetting issue when the token count exceeds max_context.
  • Optimized prefill speed.
  • Optimized generate speed.
  • Optimized model initialization time
  • Added support for four input interfaces: prompt, embedding, token, and multimodal.

- Python
Published by yhcvb over 1 year ago

https://github.com/airockchip/rknn-llm - release-v1.0.1

  • Optimize model conversion memory occupation
  • Optimize inference memory occupation
  • Increase prefill speed
  • Reduce initialization time
  • Improve quantization accuracy
  • Add support for Gemma, ChatGLM3, MiniCPM, InternLM2, and Phi-3
  • Add Server invocation
  • Add inference interruption interface
  • Add logprob and token_id to the return value

- Python
Published by yhcvb about 2 years ago