https://github.com/baohaoliao/rsd

[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.5%) to scientific vocabulary

Keywords

decoding-algorithm efficiency large-language-models process-reward-model reasoning speculative-decoding

Last synced: 5 months ago · JSON representation

Repository

[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.

Basic Info

Host: GitHub
Owner: BaohaoLiao
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://arxiv.org/abs/2501.19324
Size: 10.9 MB

Statistics

Stars: 26
Watchers: 2
Forks: 3
Open Issues: 0
Releases: 0

Topics

decoding-algorithm efficiency large-language-models process-reward-model reasoning speculative-decoding

Created about 1 year ago · Last pushed 10 months ago

Metadata Files

Readme License

Reward-Guided Speculative Decoding (RSD) for Efficient LLM Reasoning

[![arXiv](https://img.shields.io/badge/arXiv-2308.13137-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.19324)

Introduction

We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs). RSD employs a process reward model to evaluate intermediate decoding steps from draft model, and dynamically decide whether to invoke the target model, optimizing the trade-off between computational cost and output quality. Extensive evaluations on challenging reasoning benchmarks, including Olympiad-level tasks, show that RSD delivers significant efficiency gains against decoding with the target model only (up to 4.4x fewer FLOPs), while achieving significant better accuracy than parallel decoding method on average (up to +3.5).

Support

[x] vLLM online mode: Need at least 3 GPUs to serve the draft, target, and process reward model, since vLLM doesn't support serving multiple models on 1 GPU.

News

[2025/05/01] 🎊 RSD is accepted to ICML2025 as a poster

Installation

```shell

For math evaluation

pip install -r requirements.txt

For using Skywork-PRM

git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git cd skywork-o1-prm-inference pip install -e . ```

Efficient Decoding

1. Preparation

We mainly use Qwen2.5-Math family and Skywork-o1-Open-PRM-Qwen-2.5-1.5B. You need to change max_position_embeddings in their config.json from 4096 to 16384, which aims to avoid max_tokens error in vLLM. We only use the generation shorter than 4096, so this change won't affect the performance.

2. Model serve shell bash scripts/serve_draft_model.sh bash scripts/serve_target_model.sh bash scripts/serve_prm.sh

3. Evaluation shell bash scripts/math_eval.sh`

Acknowledgement

Our code base mainly builds on Qwen2.5-Math and skywork-o1-prm-inference.

Citation

@misc{liao2025reward, title={Reward-Guided Speculative Decoding for Efficient LLM Reasoning}, author={Baohao Liao and Yuhui Xu and Hanze Dong and Junnan Li and Christof Monz and Silvio Savarese and Doyen Sahoo and Caiming Xiong}, year={2025}, eprint={2501.19324}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2501.19324}, }

Owner

Name: baohao
Login: BaohaoLiao
Kind: user
Location: Netherlands
Company: University of Amsterdam

Website: https://baohaoliao.github.io/
Repositories: 1
Profile: https://github.com/BaohaoLiao

PhD candidate @ltl-uva for NLP

GitHub Events

Total

Issues event: 2
Watch event: 36
Issue comment event: 1
Push event: 7
Fork event: 3

Last Year

Issues event: 2
Watch event: 36
Issue comment event: 1
Push event: 7
Fork event: 3

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/baohaoliao/rsd

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Reward-Guided Speculative Decoding (RSD) for Efficient LLM Reasoning

[![arXiv](https://img.shields.io/badge/arXiv-2308.13137-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.19324)

Introduction

Support

News

Installation

For math evaluation

For using Skywork-PRM

Efficient Decoding

Acknowledgement

Citation

Owner

GitHub Events

Total

Last Year