baldeagle

3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding

https://github.com/nickl77/baldeagle

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding

Basic Info

Host: GitHub
Owner: NickL77
Language: Python
Default Branch: master
Homepage:
Size: 1.98 MB

Statistics

Stars: 75
Watchers: 5
Forks: 13
Open Issues: 14
Releases: 0

Created about 1 year ago · Last pushed 12 months ago

Metadata Files

Readme Citation

BaldEagle

Unofficial Implementation of EAGLE Speculative Decoding.

Read our launch announcement: https://frugalgpu.substack.com/p/introducing-baldeagle

Read our guide on how to train your own EAGLE model: https://frugalgpu.substack.com/p/how-to-train-your-own-eagle-speculative

Features

Training

Clean model implementation on top of HuggingFace Transformers that can be replicated for all models
- Abstracts away attention, causal mask, etc.
Training loop is implemented using HuggingFace Trainer for more readable and modular code
- Easily modify learning rate scheduler
- Abstracts away gradient accumulation, autocasting, checkpointing, logging, resuming, etc.

Data Generation

Improved data generation scripts that modularizes data formatting, tokenization, and loss mask generation
- Easy to switch to other datasets and tokenizers
- Ultrachat and ShareGPT implementations already included
view_data.py script that shows loss mask on original text for validation purposes (see here for more details)

Benchmarking

Benchmarking scripts using sglang for production quality inference (see here for more details)

Models Trained with BaldEagle

| Target Model | BaldEagle Model | |-------------------------|-----------------------------------| | Llama-3.1-8B-Instruct | BaldEagle-Llama-3.1-8B-Instruct | | Qwen-2.5-7B-Instruct | BaldEagle-Qwen-2.5-7B-Instruct |

Getting Started with Training

1. Data Generation

Note: Data requires a significant amount of disk space since we're saving sequencelength x hiddendim for each sample. ShareGPT (68k rows) requires ~650GB and Ultrachat (200k rows) requires ~2TB

Edit generate_data.py for the dataset and model you are using.
- Section 1 is focused on the dataset and reformatting it if necessary; by default we use Ultrachat and ShareGPT is availble in the commented blocks
- Section 2 tokenizes and generates the loss mask based on the tokenizer's chat template.
In allocation.py set the GPU's you want to use for data generation
- This will split the data and call generate_data.py on separate slices on different GPUs
- Modify outdir variable
Call allocation.py while specifying the output directory --outdir
- ie. python allocation.py --outdir {output_directory}

2. Training

In train.py, modify the necessary variables
- Specify path to a local path for the main model you're training for
- Modify the datapaths in the Load data section to match your data paths from the previous section
- Modify any trainer parameters
Launch the training script on 1 GPU with python3 train.py

Eagle 3 Status

Training Time Test

Currently, training-time test from Eagle 3 paper is being worked on in the train/train_eagle_ttt.py and train/modules/trainer/trainer_eagle_ttt.py files.

Eagle 2 + Training Time Test Model: https://huggingface.co/NickL77/BaldEagle-TTT-Llama-3.1-8B-Instruct-alpha - 11.7% faster, 8.4% greater acceptance rate than Eagle 2 baseline

Fused Features

Fused features requires new data generation and EAGLE 3 trains on target model generations rather than fixed dataset, which EAGLE 1 does. Fused features will require - [Experimental ]new data generation to extract high, medium, and low features - this will require 3x more storage - currently, generate_data_fused_features.py can generate low, mid, and high features - this is based on EAGLE repos's layer selection here - faster data generation since target model generation will be required - ideally we can use a faster inference server like VLLM or sglang rather than huggingface - modifications to model and trainer code for feature fusion

Feel free to open an issue to discuss implementation and results!

Citation

If you found this project useful, please cite this with:

Liu, N. (2025). BaldEagle (Version 1.0.0) [Computer software]. https://github.com/NickL77/BaldEagle/

@software{Liu_BaldEagle_2025, title = {BaldEagle}, author = {Liu, Nicholas}, year = {2025}, month = {May}, url = {https://github.com/NickL77/BaldEagle/}, license = {MIT}, version = {1.0.0} }

Owner

Name: Nicholas Liu
Login: NickL77
Kind: user

Repositories: 2
Profile: https://github.com/NickL77

Robots and AI

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Liu
    given-names: Nicholas
    orcid: https://orcid.org/0009-0009-3185-3333
title: BaldEagle
abstract: BaldEagle is a simple training framework for speculative decoding models.
type: software
version: "1.0.0"
date-released: 2025-05-13
url: "https://github.com/username/BaldEagle"
repository: "https://github.com/NickL77/BaldEagle/"
license: "MIT"

GitHub Events

Total

Issues event: 17
Watch event: 63
Delete event: 1
Issue comment event: 58
Push event: 18
Fork event: 12
Create event: 4

Last Year

Issues event: 17
Watch event: 63
Delete event: 1
Issue comment event: 58
Push event: 18
Fork event: 12
Create event: 4

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 20
Total pull requests: 0
Average time to close issues: 7 days
Average time to close pull requests: N/A
Total issue authors: 16
Total pull request authors: 0
Average comments per issue: 3.95
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 20
Pull requests: 0
Average time to close issues: 7 days
Average time to close pull requests: N/A
Issue authors: 16
Pull request authors: 0
Average comments per issue: 3.95
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

dongyibo (3)
KerwinKai (2)
Siegfried-qgf (2)
slZheng077 (1)
xiaotinghe (1)
RefalMachine (1)
Mahaotian1 (1)
Linking-ai (1)
SiqiLi-Fighting (1)
zyksir (1)
Dawson-Ren (1)
piDack (1)
Ximingwang-09 (1)
Lzhang-hub (1)
Arcmoon-Hu (1)