o1-pruner

Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

https://github.com/stardewxxx/o1-pruner

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

Basic Info

Host: GitHub
Owner: StarDewXXX
License: mit
Language: Python
Default Branch: main
Size: 18.2 MB

Statistics

Stars: 66
Watchers: 2
Forks: 2
Open Issues: 3
Releases: 0

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme Contributing License Code of conduct Citation Security

O1-Pruner

Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning.

O1-Pruner is a post-training technique that can accelerate the inference O1-like long thought reasoning models. Experiments show that the inference time overhead can be reduced by up to 50%. For more details, see our paper on arxiv: O1-Pruner

Pruned O1 Models

Models

Marco-o1-7B-Pruned

QwQ-32B-Preview-Pruned

Abstract

Recently, long-thought reasoning LLMs, such as OpenAI's O1, have adopted extended reasoning processes similar to how humans ponder over complex problems. This reasoning paradigm significantly enhances the model's problem-solving abilities and has achieved promising results. However, long-thought reasoning process leads to a substantial increase in inference time. A pressing challenge is reducing the inference overhead of long-thought LLMs while ensuring accuracy. In this paper, we experimentally demonstrate that long-thought reasoning models struggle to effectively allocate token budgets based on problem difficulty and reasoning redundancies. To address this, we propose Length-Harmonizing Fine-Tuning (O1-Pruner), aiming at minimizing reasoning overhead while maintaining accuracy. This effective fine-tuning method first estimates the LLM's baseline performance through pre-sampling and then uses RL-style fine-tuning to encourage the model to generate shorter reasoning processes under accuracy constraints. This allows the model to achieve efficient reasoning with lower redundancy while maintaining accuracy. Experiments on various mathematical reasoning benchmarks show that O1-Pruner not only significantly reduces inference overhead but also achieves higher accuracy, providing a novel and promising solution to this challenge.

Time Cost of Different Methods

Method Overview

Usage

We use A800-80G GPUs for inference and training. For the 7B model, 4 GPUs are required; for the 32B model, 8 GPUs are required.

Installation

Firstly you should create a venv using conda bash conda create -n o1-pruner python==3.11.9 conda activate o1-pruner Then clone and install our project bash git clone https://github.com/StarDewXXX/O1-Pruner cd O1-Pruner pip install -e . Our project uses llamafactory (a modified version for our algorithm) for training and vllm for generation. During our experiments, we encountered version conflict issues. To avoid potential conflicts, we recommend installing vllm in a separate environment. bash conda create -n vllm python==3.11.9 conda activate vllm pip install vllm==0.6.3

Generate Your Training Data (Taking QwQ-32B-Preview as an example)

Parameter meanings:

K: The number of solutions generated for each problem.

alpha: Accuracy penalty term. The higher the value, the more the model will focus on accuracy rather than the length of the output.

Generating samples is relatively time-consuming because the samples produced by O1 models are quite long. For efficiency reasons, you can reduce the value of K, but this will increase the calculation error of the reward, which may affect the final performance. Note that *K** should be set to the same value during both the inference and dataset construction stages* ```bash

By default we use 4 gpus for inference, you can change gpus count by setting --n_gpus in the command

You can select differenct commands, for training data generation, use commands in the "Generate Training Data for O1-Pruner" area

bash o1_scripts/inference.sh ```

bash python o1_scripts/construct_dataset.py --file_name QwQ_math_train_8192_normal_K-12 --K 12 --model_name QwQ --model_path Qwen/QwQ-32B-Preview --alpha 5

Prune Your Model

bash llamafactory-cli train examples/math/QwQ-32B.yaml

Test Your Model

```bash

You can select differenct commands, for testing models, use commands in the "Test Pruned Model or Orginal Model" area

bash o1_scripts/inference.sh ```

Owner

Name: Haotian Luo
Login: StarDewXXX
Kind: user

Repositories: 1
Profile: https://github.com/StarDewXXX

Citation (CITATION.cff)

cff-version: 1.2.0
date-released: 2024-03
message: "If you use this software, please cite it as below."
authors:
- family-names: "Zheng"
  given-names: "Yaowei"
- family-names: "Zhang"
  given-names: "Richong"
- family-names: "Zhang"
  given-names: "Junhao"
- family-names: "Ye"
  given-names: "Yanhan"
- family-names: "Luo"
  given-names: "Zheyan"
- family-names: "Feng"
  given-names: "Zhangchi"
- family-names: "Ma"
  given-names: "Yongqiang"
title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
url: "https://arxiv.org/abs/2403.13372"
preferred-citation:
  type: conference-paper
  conference:
    name: "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)"
  authors:
    - family-names: "Zheng"
      given-names: "Yaowei"
    - family-names: "Zhang"
      given-names: "Richong"
    - family-names: "Zhang"
      given-names: "Junhao"
    - family-names: "Ye"
      given-names: "Yanhan"
    - family-names: "Luo"
      given-names: "Zheyan"
    - family-names: "Feng"
      given-names: "Zhangchi"
    - family-names: "Ma"
      given-names: "Yongqiang"
  title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
  url: "https://arxiv.org/abs/2403.13372"
  year: 2024
  publisher: "Association for Computational Linguistics"
  address: "Bangkok, Thailand"

GitHub Events

Total

Issues event: 15
Watch event: 80
Issue comment event: 12
Member event: 2
Push event: 21
Fork event: 3
Create event: 2

Last Year

Issues event: 15
Watch event: 80
Issue comment event: 12
Member event: 2
Push event: 21
Fork event: 3
Create event: 2

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 9
Total pull requests: 0
Average time to close issues: 11 days
Average time to close pull requests: N/A
Total issue authors: 9
Total pull request authors: 0
Average comments per issue: 0.78
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 9
Pull requests: 0
Average time to close issues: 11 days
Average time to close pull requests: N/A
Issue authors: 9
Pull request authors: 0
Average comments per issue: 0.78
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

sunday-hao (1)
NielsRogge (1)
CownowAn (1)
dingyue772 (1)
wanghaoyu0408 (1)
NieSYsc20 (1)
jbjeong91 (1)
Hanpx20 (1)
jinzhensheng (1)