q-galore

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.

https://github.com/vita-group/q-galore

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.5%) to scientific vocabulary

Keywords

large-language-models low-rank memory-efficient-learning quantization

Last synced: 4 months ago · JSON representation ·

Repository

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.

Basic Info

Host: GitHub
Owner: VITA-Group
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 343 KB

Statistics

Stars: 200
Watchers: 9
Forks: 17
Open Issues: 10
Releases: 0

Topics

large-language-models low-rank memory-efficient-learning quantization

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

Q-GaLore

This repo contains the pre-release version of Q-GaLore algorithm, proposed by Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.

Q-GaLore is a memory-efficient training methodology effective in both pre-training and fine-tuning scenarios. Q-GaLore incorporates two main components: (i) low precision training utilizing low-rank gradients, and (ii) lazy layer-wise subspace exploration. It enables full-parameter learning while requiring less memory, such as training a LLaMA-7B model from scratch on a single NVIDIA RTX 4060 Ti with only 16GB of memory.

Read this blog for more details!

Install Q-GaLore optimizer

Install via conda

conda env create -f environment.yml

or Install Q-GaLore optimizer and experiment dependencies

```bash

install from pip

pip install q-galore-torch

or install from source:

git clone https://github.com/VITA-Group/Q-GaLore.git cd Q-GaLore pip install -e .

pip install -r exp_requirements.txt ```

Usage

Pretraining LLaMA model on C4 dataset

We provide the command in scripts/pretrain_c4 for pretraining LLaMA model with sizes from 60M to 7B on C4 dataset. We also provide the simulation mode implementation of quantization with scripts in scripts/pretrain_c4/simulation. For example, training a LLaMA-60M with Q-GaLore-Adam8bit with the following scripts.

torchrun --standalone --nproc_per_node 1 run_pretrain.py \ --model_config configs/llama_130m.json \ --lr 0.015 \ --galore_scale 0.25 \ --rank 256 \ --update_proj_gap 200 \ --batch_size 256 \ --total_batch_size 512 \ --num_training_steps 20000 \ --warmup_steps 2000 \ --weight_decay 0 \ --dtype bfloat16 \ --eval_every 1000 \ --optimizer q_galore_adamw8bit \ --project 'g-galore-c4' \ --weight_quant \ --stochastic_round \ --proj_quant \ --name Q-Galore-Adam8bit-LLaMA-130M

Pretraining LLaMA-7B model within 16GB memory

The command of training LLaMA-7B model on single GPU as provided within scripts/pretrain_c4/single_gpu. With 16 batch size and activation checkpointing, the following scripts can pre-train a LLaMA-7B model with 15.26GB memory (tested on a single A6000 GPU)

```

LLaMA-7B, 8-bit Q-GaLore-Adam, single GPU

Memory cost: 15.26G, BSZ=16

torchrun --standalone --nprocpernode 1 runpretrain.py \ --modelconfig configs/llama7b.json \ --lr 0.004 \ --galorescale 0.25 \ --rank 1024 \ --updateprojgap 500 \ --batchsize 16 \ --totalbatchsize 512 \ --activationcheckpointing \ --numtrainingsteps 150000 \ --warmupsteps 15000 \ --weightdecay 0 \ --gradclipping 1.0 \ --dtype bfloat16 \ --evalevery 1000 \ --singlegpu \ --projquant \ --weightquant \ --stochasticround \ --optimizer qgaloreadamw8bitperlayer

```

Citation

bibtex @misc{zhang2024qgalore, title={Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients}, author={Zhenyu Zhang and Ajay Jaiswal and Lu Yin and Shiwei Liu and Jiawei Zhao and Yuandong Tian and Zhangyang Wang}, year={2024}, eprint={2407.08296}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2407.08296}, }

Owner

Name: VITA
Login: VITA-Group
Kind: organization

Website: https://vita-group.github.io
Repositories: 75
Profile: https://github.com/VITA-Group

Visual Informatics Group @ University of Texas at Austin

Citation (CITATION.cff)

cff-version: 1.2.0
title: "Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients"
version: 1.0.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: "Zhenyu"
    given-names: "Zhang"
year: 2024
repository-code: "TBD"

GitHub Events

Total

Issues event: 3
Watch event: 35
Fork event: 4

Last Year

Issues event: 3
Watch event: 35
Fork event: 4

Committers

Last synced: 7 months ago

All Time

Total Commits: 9
Total Committers: 1
Avg Commits per committer: 9.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 9
Committers: 1
Avg Commits per committer: 9.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Allen	3****n	9

Issues and Pull Requests

Last synced: 4 months ago

All Time

Total issues: 9
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 9
Total pull request authors: 1
Average comments per issue: 0.22
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 5
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 5
Pull request authors: 0
Average comments per issue: 0.2
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Khaledbouza (1)
LLMresearcher (1)
philschmid (1)
0wwafa (1)
kostum123 (1)
radhacr (1)
GeraudBourdin (1)
huu4ontocord (1)
lucasmgomez (1)

Pull Request Authors

Khaledbouza (2)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

exp_requirements.txt pypi

bitsandbytes *
datasets *
evaluate *
galore-torch *
lion-pytorch *
loguru *
matplotlib *
nvitop *
peft *
scikit-learn *
scipy *
tokenizers *
torch *
transformers ==4.31.0
wandb *

q_galore_torch/utils/setup.py pypi

requirements.txt pypi

bitsandbytes *
torch *
transformers *

setup.py pypi

environment.yml conda

_libgcc_mutex 0.1
_openmp_mutex 5.1
blas 1.0
brotli-python 1.0.9
bzip2 1.0.8
ca-certificates 2024.3.11
certifi 2024.2.2
charset-normalizer 2.0.4
cuda-cudart 12.1.105
cuda-cupti 12.1.105
cuda-libraries 12.1.0
cuda-nvrtc 12.1.105
cuda-nvtx 12.1.105
cuda-opencl 12.4.127
cuda-runtime 12.1.0
ffmpeg 4.3
freetype 2.12.1
gmp 6.2.1
gmpy2 2.1.2
gnutls 3.6.15
idna 3.7
intel-openmp 2023.1.0
jinja2 3.1.3
jpeg 9e
lame 3.100
lcms2 2.12
ld_impl_linux-64 2.38
lerc 3.0
libcublas 12.1.0.26
libcufft 11.0.2.4
libcufile 1.9.1.3
libcurand 10.3.5.147
libcusolver 11.4.4.55
libcusparse 12.0.2.55
libdeflate 1.17
libffi 3.4.4
libgcc-ng 11.2.0
libgomp 11.2.0
libiconv 1.16
libidn2 2.3.4
libjpeg-turbo 2.0.0
libnpp 12.0.2.50
libnvjitlink 12.1.105
libnvjpeg 12.1.1.14
libpng 1.6.39
libstdcxx-ng 11.2.0
libtasn1 4.19.0
libtiff 4.5.1
libunistring 0.9.10
libwebp-base 1.3.2
llvm-openmp 14.0.6
lz4-c 1.9.4
markupsafe 2.1.3
mkl 2023.1.0
mkl-service 2.4.0
mkl_fft 1.3.8
mkl_random 1.2.4
mpc 1.1.0
mpfr 4.0.2
mpmath 1.3.0
ncurses 6.4
nettle 3.7.3
networkx 3.1
numpy 1.24.3
numpy-base 1.24.3
openh264 2.1.1
openjpeg 2.4.0
openssl 3.0.13
pillow 10.3.0
pip 24.0
pysocks 1.7.1
python 3.8.19
pytorch 2.3.0
pytorch-cuda 12.1
pytorch-mutex 1.0
pyyaml 6.0.1
readline 8.2
requests 2.31.0
setuptools 69.5.1
sqlite 3.45.3
sympy 1.12
tbb 2021.8.0
tk 8.6.14
torchaudio 2.3.0
torchtriton 2.3.0
torchvision 0.18.0
typing_extensions 4.9.0
urllib3 2.1.0
wheel 0.43.0
xz 5.4.6
yaml 0.2.5
zlib 1.2.13
zstd 1.5.5

q-galore

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Q-GaLore

Install Q-GaLore optimizer

Install via conda

or Install Q-GaLore optimizer and experiment dependencies

install from pip

or install from source:

Usage

Pretraining LLaMA model on C4 dataset

Pretraining LLaMA-7B model within 16GB memory

LLaMA-7B, 8-bit Q-GaLore-Adam, single GPU

Memory cost: 15.26G, BSZ=16

Citation

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies