https://github.com/acai66/qwen_numpy

使用numpy实现DeepSeek-R1-Distill-Qwen-1.5B的推理过程，易于学习LLM推理与移植到其它编程语言加速。 Implementing the inference process of DeepSeek-R1-Distill-Qwen-1.5B using numpy, making it easy to learn LLM (Large Language Model) inference and to port to other programming languages for acceleration.

Keywords

deepseek deepseek-r1 llama-cpp llm-inference numpy qwen qwen2

Last synced: 9 months ago · JSON representation

Repository

使用numpy实现DeepSeek-R1-Distill-Qwen-1.5B的推理过程，易于学习LLM推理与移植到其它编程语言加速。 Implementing the inference process of DeepSeek-R1-Distill-Qwen-1.5B using numpy, making it easy to learn LLM (Large Language Model) inference and to port to other programming languages for acceleration.

Basic Info

Host: GitHub
Owner: acai66
Language: Python
Default Branch: main
Homepage: https://hyacm.com
Size: 45.9 KB

Statistics

Stars: 9
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

deepseek deepseek-r1 llama-cpp llm-inference numpy qwen qwen2

Created over 1 year ago · Last pushed 10 months ago

Metadata Files

Readme

README.md

阿里通义千问 Qwen2.5、Qwen3 numpy推理(支持Deekseek-R1蒸馏的Qwen模型)

只使用 numpy 实现 Qwen 的推理，不使用 torch、transformers 等框架，易于学习LLM的推理过程，以及移植到其它语言
支持阿里云原始的通义千问 Qwen2.5 模型、Qwen3 模型、Deekseek-R1 蒸馏的 Qwen2.5 模型，其它微调模型暂未测试(理论上支持)
支持 batch 推理
支持 temperature、topk、topp、penalty等参数
支持 KV缓存
支持 q8_0量化
以学习为目的，约400行代码实现了完整的llm推理过程，不含 tokenization 部分

测试

1. 安装依赖

bash pip install numpy tokenizers

2. 下载 `safetensors` 模型

到模型分享平台下载完整模型，参考 modelscope 平台下载说明

3. 转换模型

使用 parse_safetensors.py 脚本转换模型，提供下载的模型目录，转换后的npy模型保存目录，例如:

bash python parse_safetensors.py --model_dir 下载的模型目录 --npy_save_dir 转换后的npy模型保存目录

4. 运行推理

修改 model.py 中的模型路径、prompt，运行 model.py，或者手动从 model.py 中导入 Model 类，参考 model.py 中 main 函数的使用方法

```python if name == 'main': # chattemplate = '<|imstart|>system\nYou are a helpful assistant.<|imend|>\n<|imstart|>user\n{}<|imend|>\n<|imstart|>assistant\n' # modelweightspath = '/Users/acai/Downloads/models/Qwen2.50.5BInstructnpyFP32' # chattemplate = '<｜begin▁of▁sentence｜><｜begin▁of▁sentence｜>You are a helpful assistant.<｜User｜>{}<｜Assistant｜>\n' # modelweightspath = '/Users/acai/Downloads/models/DeepSeekR1DistillQwen1.5BnpyFP32' chattemplate = '<|imstart|>user\n{}<|imend|>\n<|imstart|>assistant\n\n\n\n\n' # no thinking mode # chattemplate = '<|imstart|>user\n{}<|imend|>\n<|imstart|>assistant\n\n' # thinking mode modelweightspath = '/Users/acai/Downloads/models/Qwen30.6BnpyFP32'

model = Model(model_weights_path)


prompt = [
    # "怎么用python numpy实softmax？",
    "你是谁？",
    # "计算456+826",
] # 批次
text = list(map(lambda x: chat_template.format(x), prompt))

model_inputs = np.array([model.tokenizer.encode_batch_fast(text)[i].ids for i in range(len(text))], dtype=np.int32)

generated_ids = model.generate(
    model_inputs,
    max_new_tokens=2048
)

response = model.tokenizer.decode_batch(generated_ids, skip_special_tokens=True)
print(f'{"\n".join(response)}')

```

Benchmark

与 llama.cpp对比每秒 Tokens 速度，测试平台为 Mac mini M4，内存16G

|模型|精度|numpy|llama.cpp| |:---:|:---:|:---:|:---:| |Qwen2.50.5BInstruct|float32|29.77|45.6| |Qwen2.50.5BInstruct|float16|-|86.44| |Qwen2.50.5BInstruct|q80|1.94|140.53| |DeepSeekR1DistillQwen1.5B|float32|10.31|15.55| |DeepSeekR1DistillQwen1.5B|float16|-|31.55| |DeepSeekR1DistillQwen1.5B|q80|0.68|54.47|

7B模型用float32精度时需要30G左右内存，机器内存不足，未测试

比较震惊的是 numpy 的矩阵加速只支持float32、float64，不支持整数、半精度等，导致float32速度是最快的，float32模型内存占用很大，容易导致内存不足，同时对内存带宽的要求很高，估计只能通过移植到其它语言，从底层优化矩阵运算才能加速到 llama.cpp 的速度

参考

Owner

Name: acai
Login: acai66
Kind: user

Website: https://hyacm.com
Repositories: 33
Profile: https://github.com/acai66

悟已往之不谏，知来者之可追

GitHub Events

Total

Last Year

Committers

Last synced: 10 months ago

All Time

Total Commits: 10
Total Committers: 1
Avg Commits per committer: 10.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 10
Committers: 1
Avg Commits per committer: 10.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
acai66	1**6@q**m	10

Committer Domains (Top 20 + Academic)

qq.com: 1

Issues and Pull Requests

Last synced: 10 months ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/acai66/qwen_numpy

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

阿里通义千问 Qwen2.5、Qwen3 numpy推理(支持Deekseek-R1蒸馏的Qwen模型)

测试

1. 安装依赖

2. 下载 `safetensors` 模型

3. 转换模型

4. 运行推理

Benchmark

参考

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

https://github.com/acai66/qwen_numpy

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

阿里通义千问 Qwen2.5、Qwen3 numpy推理(支持Deekseek-R1蒸馏的Qwen模型)

测试

1. 安装依赖

2. 下载 safetensors 模型

3. 转换模型

4. 运行推理

Benchmark

参考

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

2. 下载 `safetensors` 模型