squeezellm-gradients

https://github.com/kssteven418/squeezellm-gradients

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.4%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: kssteven418
License: apache-2.0
Language: Python
Default Branch: main
Size: 154 MB

Statistics

Stars: 18
Watchers: 1
Forks: 7
Open Issues: 5
Releases: 0

Created over 2 years ago · Last pushed over 2 years ago

Metadata Files

Readme Contributing License Code of conduct Citation Security

Gradient Computation for SqueezeLLM

SqueezeLLM utilizes the Fisher Information matrix as a sensitivity metric. This repository, which builds on top of Huggingface's transformer library, is designed to calculate the Fisher sensitivity score (gradient square). This score can be employed in the quantization pipeline of our official SqueezeLLM library.

Prerequisite

You will need to have your own Huggingface-compatible LLaMA checkpoint saved at [MODEL_PATH].

Run the following command for setup: conda create -n sqllm-grad python=3.9 -y conda activate sqllm-grad pip install -e . pip install -r requirements.txt

Command

Run the following command: CUDA_VISIBLE_DEVICES=0 python run.py --output_dir [OUTPUT_PATH] --model_name [MODEL_PATH] # single GPU CUDA_VISIBLE_DEVICES=0,1 python run.py --output_dir [OUTPUT_PATH] --model_name [MODEL_PATH] # multi GPU

This command performs the following steps

Loads the model from [MODEL_PATH]. Currently, we support LLaMA and Mistral models.
Computes the gradient square using a subset of the C4 training dataset as a calibration set. You can define and use your own calibration dataset.
Outputs the gradient square at [OUTPUT_PATH]. The output format will be identical to the loaded Huggingface model checkpoint, with the only difference being that the weight values are replaced by the gradient square.

If the model size exceeds the capacity of a single GPU, our framework provides an option to distribute the model across multiple GPUs. This is automated by configuring multiple CUDA visible devices. To be specific, the model is partitioned into multiple chunks of consecutive layers, and each segment is assigned to an individual GPU device.

You can also use the --num_examples argument to change the number of calibration examples. This defaults to 100.

Owner

Name: Sehoon Kim
Login: kssteven418
Kind: user
Location: Berkeley, United States
Company: BAIR, UC Berkeley

Website: sehoonkim.org
Repositories: 29
Profile: https://github.com/kssteven418

GitHub Events

Total

Watch event: 6
Fork event: 2

Last Year

Watch event: 6
Fork event: 2

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 4
Total pull requests: 14
Average time to close issues: N/A
Average time to close pull requests: about 12 hours
Total issue authors: 4
Total pull request authors: 3
Average comments per issue: 0.5
Average comments per pull request: 0.0
Merged pull requests: 13
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 6
Average time to close issues: N/A
Average time to close pull requests: 1 day
Issue authors: 2
Pull request authors: 3
Average comments per issue: 1.0
Average comments per pull request: 0.0
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

squeezellm-gradients

Science Score: 36.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Gradient Computation for SqueezeLLM

Prerequisite

Command

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels