Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: kssteven418
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 154 MB
Statistics
  • Stars: 18
  • Watchers: 1
  • Forks: 7
  • Open Issues: 5
  • Releases: 0
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme Contributing License Code of conduct Citation Security

README.md

Gradient Computation for SqueezeLLM

SqueezeLLM utilizes the Fisher Information matrix as a sensitivity metric. This repository, which builds on top of Huggingface's transformer library, is designed to calculate the Fisher sensitivity score (gradient square). This score can be employed in the quantization pipeline of our official SqueezeLLM library.

Prerequisite

You will need to have your own Huggingface-compatible LLaMA checkpoint saved at [MODEL_PATH].

Run the following command for setup: conda create -n sqllm-grad python=3.9 -y conda activate sqllm-grad pip install -e . pip install -r requirements.txt

Command

Run the following command: CUDA_VISIBLE_DEVICES=0 python run.py --output_dir [OUTPUT_PATH] --model_name [MODEL_PATH] # single GPU CUDA_VISIBLE_DEVICES=0,1 python run.py --output_dir [OUTPUT_PATH] --model_name [MODEL_PATH] # multi GPU

This command performs the following steps

  1. Loads the model from [MODEL_PATH]. Currently, we support LLaMA and Mistral models.
  2. Computes the gradient square using a subset of the C4 training dataset as a calibration set. You can define and use your own calibration dataset.
  3. Outputs the gradient square at [OUTPUT_PATH]. The output format will be identical to the loaded Huggingface model checkpoint, with the only difference being that the weight values are replaced by the gradient square.

If the model size exceeds the capacity of a single GPU, our framework provides an option to distribute the model across multiple GPUs. This is automated by configuring multiple CUDA visible devices. To be specific, the model is partitioned into multiple chunks of consecutive layers, and each segment is assigned to an individual GPU device.

You can also use the --num_examples argument to change the number of calibration examples. This defaults to 100.

Owner

  • Name: Sehoon Kim
  • Login: kssteven418
  • Kind: user
  • Location: Berkeley, United States
  • Company: BAIR, UC Berkeley

GitHub Events

Total
  • Watch event: 6
  • Fork event: 2
Last Year
  • Watch event: 6
  • Fork event: 2

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 4
  • Total pull requests: 14
  • Average time to close issues: N/A
  • Average time to close pull requests: about 12 hours
  • Total issue authors: 4
  • Total pull request authors: 3
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 13
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 6
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 day
  • Issue authors: 2
  • Pull request authors: 3
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • tjtanaa (1)
  • Quang-elec44 (1)
  • Chen-1031 (1)
  • georgelund (1)
Pull Request Authors
  • kssteven418 (12)
  • sidjha1 (2)
  • SyphonArch (2)
Top Labels
Issue Labels
Pull Request Labels