baldeagle

3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding

https://github.com/nickl77/baldeagle

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding

Basic Info
  • Host: GitHub
  • Owner: NickL77
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 1.98 MB
Statistics
  • Stars: 75
  • Watchers: 5
  • Forks: 13
  • Open Issues: 14
  • Releases: 0
Created 11 months ago · Last pushed 8 months ago
Metadata Files
Readme Citation

README.md

BaldEagle

Unofficial Implementation of EAGLE Speculative Decoding.

Read our launch announcement: https://frugalgpu.substack.com/p/introducing-baldeagle

Read our guide on how to train your own EAGLE model: https://frugalgpu.substack.com/p/how-to-train-your-own-eagle-speculative

Features

Training

  • Clean model implementation on top of HuggingFace Transformers that can be replicated for all models
    • Abstracts away attention, causal mask, etc.
  • Training loop is implemented using HuggingFace Trainer for more readable and modular code
    • Easily modify learning rate scheduler
    • Abstracts away gradient accumulation, autocasting, checkpointing, logging, resuming, etc.

Data Generation

  • Improved data generation scripts that modularizes data formatting, tokenization, and loss mask generation
    • Easy to switch to other datasets and tokenizers
    • Ultrachat and ShareGPT implementations already included
  • view_data.py script that shows loss mask on original text for validation purposes (see here for more details)

Benchmarking

  • Benchmarking scripts using sglang for production quality inference (see here for more details)

Models Trained with BaldEagle

| Target Model | BaldEagle Model | |-------------------------|-----------------------------------| | Llama-3.1-8B-Instruct | BaldEagle-Llama-3.1-8B-Instruct | | Qwen-2.5-7B-Instruct | BaldEagle-Qwen-2.5-7B-Instruct |

Getting Started with Training

1. Data Generation

Note: Data requires a significant amount of disk space since we're saving sequencelength x hiddendim for each sample. ShareGPT (68k rows) requires ~650GB and Ultrachat (200k rows) requires ~2TB

  1. Edit generate_data.py for the dataset and model you are using.
    • Section 1 is focused on the dataset and reformatting it if necessary; by default we use Ultrachat and ShareGPT is availble in the commented blocks
    • Section 2 tokenizes and generates the loss mask based on the tokenizer's chat template.
  2. In allocation.py set the GPU's you want to use for data generation
    • This will split the data and call generate_data.py on separate slices on different GPUs
    • Modify outdir variable
  3. Call allocation.py while specifying the output directory --outdir
    • ie. python allocation.py --outdir {output_directory}

2. Training

  1. In train.py, modify the necessary variables
    • Specify path to a local path for the main model you're training for
    • Modify the datapaths in the Load data section to match your data paths from the previous section
    • Modify any trainer parameters
  2. Launch the training script on 1 GPU with python3 train.py

Eagle 3 Status

Training Time Test

Currently, training-time test from Eagle 3 paper is being worked on in the train/train_eagle_ttt.py and train/modules/trainer/trainer_eagle_ttt.py files.

Eagle 2 + Training Time Test Model: https://huggingface.co/NickL77/BaldEagle-TTT-Llama-3.1-8B-Instruct-alpha - 11.7% faster, 8.4% greater acceptance rate than Eagle 2 baseline

Fused Features

Fused features requires new data generation and EAGLE 3 trains on target model generations rather than fixed dataset, which EAGLE 1 does. Fused features will require - [Experimental ]new data generation to extract high, medium, and low features - this will require 3x more storage - currently, generate_data_fused_features.py can generate low, mid, and high features - this is based on EAGLE repos's layer selection here - faster data generation since target model generation will be required - ideally we can use a faster inference server like VLLM or sglang rather than huggingface - modifications to model and trainer code for feature fusion

Feel free to open an issue to discuss implementation and results!

Citation

If you found this project useful, please cite this with:

Liu, N. (2025). BaldEagle (Version 1.0.0) [Computer software]. https://github.com/NickL77/BaldEagle/

or

@software{Liu_BaldEagle_2025, title = {BaldEagle}, author = {Liu, Nicholas}, year = {2025}, month = {May}, url = {https://github.com/NickL77/BaldEagle/}, license = {MIT}, version = {1.0.0} }

Owner

  • Name: Nicholas Liu
  • Login: NickL77
  • Kind: user

Robots and AI

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Liu
    given-names: Nicholas
    orcid: https://orcid.org/0009-0009-3185-3333
title: BaldEagle
abstract: BaldEagle is a simple training framework for speculative decoding models.
type: software
version: "1.0.0"
date-released: 2025-05-13
url: "https://github.com/username/BaldEagle"
repository: "https://github.com/NickL77/BaldEagle/"
license: "MIT" 

GitHub Events

Total
  • Issues event: 17
  • Watch event: 63
  • Delete event: 1
  • Issue comment event: 58
  • Push event: 18
  • Fork event: 12
  • Create event: 4
Last Year
  • Issues event: 17
  • Watch event: 63
  • Delete event: 1
  • Issue comment event: 58
  • Push event: 18
  • Fork event: 12
  • Create event: 4

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 20
  • Total pull requests: 0
  • Average time to close issues: 7 days
  • Average time to close pull requests: N/A
  • Total issue authors: 16
  • Total pull request authors: 0
  • Average comments per issue: 3.95
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 20
  • Pull requests: 0
  • Average time to close issues: 7 days
  • Average time to close pull requests: N/A
  • Issue authors: 16
  • Pull request authors: 0
  • Average comments per issue: 3.95
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • dongyibo (3)
  • KerwinKai (2)
  • Siegfried-qgf (2)
  • slZheng077 (1)
  • xiaotinghe (1)
  • RefalMachine (1)
  • Mahaotian1 (1)
  • Linking-ai (1)
  • SiqiLi-Fighting (1)
  • zyksir (1)
  • Dawson-Ren (1)
  • piDack (1)
  • Ximingwang-09 (1)
  • Lzhang-hub (1)
  • Arcmoon-Hu (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels