o1-pruner
Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary
Repository
Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Basic Info
- Host: GitHub
- Owner: StarDewXXX
- License: mit
- Language: Python
- Default Branch: main
- Size: 18.2 MB
Statistics
- Stars: 66
- Watchers: 2
- Forks: 2
- Open Issues: 3
- Releases: 0
Metadata Files
README.md
O1-Pruner
Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning.
O1-Pruner is a post-training technique that can accelerate the inference O1-like long thought reasoning models. Experiments show that the inference time overhead can be reduced by up to 50%. For more details, see our paper on arxiv: O1-Pruner
Pruned O1 Models
Models
Abstract
Recently, long-thought reasoning LLMs, such as OpenAI's O1, have adopted extended reasoning processes similar to how humans ponder over complex problems. This reasoning paradigm significantly enhances the model's problem-solving abilities and has achieved promising results. However, long-thought reasoning process leads to a substantial increase in inference time. A pressing challenge is reducing the inference overhead of long-thought LLMs while ensuring accuracy. In this paper, we experimentally demonstrate that long-thought reasoning models struggle to effectively allocate token budgets based on problem difficulty and reasoning redundancies. To address this, we propose Length-Harmonizing Fine-Tuning (O1-Pruner), aiming at minimizing reasoning overhead while maintaining accuracy. This effective fine-tuning method first estimates the LLM's baseline performance through pre-sampling and then uses RL-style fine-tuning to encourage the model to generate shorter reasoning processes under accuracy constraints. This allows the model to achieve efficient reasoning with lower redundancy while maintaining accuracy. Experiments on various mathematical reasoning benchmarks show that O1-Pruner not only significantly reduces inference overhead but also achieves higher accuracy, providing a novel and promising solution to this challenge.
Time Cost of Different Methods

Method Overview

Usage
We use A800-80G GPUs for inference and training. For the 7B model, 4 GPUs are required; for the 32B model, 8 GPUs are required.
Installation
Firstly you should create a venv using conda
bash
conda create -n o1-pruner python==3.11.9
conda activate o1-pruner
Then clone and install our project
bash
git clone https://github.com/StarDewXXX/O1-Pruner
cd O1-Pruner
pip install -e .
Our project uses llamafactory (a modified version for our algorithm) for training and vllm for generation. During our experiments, we encountered version conflict issues. To avoid potential conflicts, we recommend installing vllm in a separate environment.
bash
conda create -n vllm python==3.11.9
conda activate vllm
pip install vllm==0.6.3
Generate Your Training Data (Taking QwQ-32B-Preview as an example)
Parameter meanings:
K: The number of solutions generated for each problem.
alpha: Accuracy penalty term. The higher the value, the more the model will focus on accuracy rather than the length of the output.
Generating samples is relatively time-consuming because the samples produced by O1 models are quite long. For efficiency reasons, you can reduce the value of K, but this will increase the calculation error of the reward, which may affect the final performance. Note that *K** should be set to the same value during both the inference and dataset construction stages* ```bash
By default we use 4 gpus for inference, you can change gpus count by setting --n_gpus in the command
You can select differenct commands, for training data generation, use commands in the "Generate Training Data for O1-Pruner" area
bash o1_scripts/inference.sh ```
bash
python o1_scripts/construct_dataset.py --file_name QwQ_math_train_8192_normal_K-12 --K 12 --model_name QwQ --model_path Qwen/QwQ-32B-Preview --alpha 5
Prune Your Model
bash
llamafactory-cli train examples/math/QwQ-32B.yaml
Test Your Model
```bash
You can select differenct commands, for testing models, use commands in the "Test Pruned Model or Orginal Model" area
bash o1_scripts/inference.sh ```
Owner
- Name: Haotian Luo
- Login: StarDewXXX
- Kind: user
- Repositories: 1
- Profile: https://github.com/StarDewXXX
Citation (CITATION.cff)
cff-version: 1.2.0
date-released: 2024-03
message: "If you use this software, please cite it as below."
authors:
- family-names: "Zheng"
given-names: "Yaowei"
- family-names: "Zhang"
given-names: "Richong"
- family-names: "Zhang"
given-names: "Junhao"
- family-names: "Ye"
given-names: "Yanhan"
- family-names: "Luo"
given-names: "Zheyan"
- family-names: "Feng"
given-names: "Zhangchi"
- family-names: "Ma"
given-names: "Yongqiang"
title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
url: "https://arxiv.org/abs/2403.13372"
preferred-citation:
type: conference-paper
conference:
name: "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)"
authors:
- family-names: "Zheng"
given-names: "Yaowei"
- family-names: "Zhang"
given-names: "Richong"
- family-names: "Zhang"
given-names: "Junhao"
- family-names: "Ye"
given-names: "Yanhan"
- family-names: "Luo"
given-names: "Zheyan"
- family-names: "Feng"
given-names: "Zhangchi"
- family-names: "Ma"
given-names: "Yongqiang"
title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
url: "https://arxiv.org/abs/2403.13372"
year: 2024
publisher: "Association for Computational Linguistics"
address: "Bangkok, Thailand"
GitHub Events
Total
- Issues event: 15
- Watch event: 80
- Issue comment event: 12
- Member event: 2
- Push event: 21
- Fork event: 3
- Create event: 2
Last Year
- Issues event: 15
- Watch event: 80
- Issue comment event: 12
- Member event: 2
- Push event: 21
- Fork event: 3
- Create event: 2
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 9
- Total pull requests: 0
- Average time to close issues: 11 days
- Average time to close pull requests: N/A
- Total issue authors: 9
- Total pull request authors: 0
- Average comments per issue: 0.78
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 9
- Pull requests: 0
- Average time to close issues: 11 days
- Average time to close pull requests: N/A
- Issue authors: 9
- Pull request authors: 0
- Average comments per issue: 0.78
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- sunday-hao (1)
- NielsRogge (1)
- CownowAn (1)
- dingyue772 (1)
- wanghaoyu0408 (1)
- NieSYsc20 (1)
- jbjeong91 (1)
- Hanpx20 (1)
- jinzhensheng (1)