q-galore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.5%) to scientific vocabulary
Keywords
Repository
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
Basic Info
Statistics
- Stars: 200
- Watchers: 9
- Forks: 17
- Open Issues: 10
- Releases: 0
Topics
Metadata Files
README.md
Q-GaLore
This repo contains the pre-release version of Q-GaLore algorithm, proposed by Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
Q-GaLore is a memory-efficient training methodology effective in both pre-training and fine-tuning scenarios. Q-GaLore incorporates two main components: (i) low precision training utilizing low-rank gradients, and (ii) lazy layer-wise subspace exploration. It enables full-parameter learning while requiring less memory, such as training a LLaMA-7B model from scratch on a single NVIDIA RTX 4060 Ti with only 16GB of memory.
Read this blog for more details!
Install Q-GaLore optimizer
Install via conda
conda env create -f environment.yml
or Install Q-GaLore optimizer and experiment dependencies
```bash
install from pip
pip install q-galore-torch
or install from source:
git clone https://github.com/VITA-Group/Q-GaLore.git cd Q-GaLore pip install -e .
pip install -r exp_requirements.txt ```
Usage
Pretraining LLaMA model on C4 dataset
We provide the command in scripts/pretrain_c4 for pretraining LLaMA model with sizes from 60M to 7B on C4 dataset. We also provide the simulation mode implementation of quantization with scripts in scripts/pretrain_c4/simulation. For example, training a LLaMA-60M with Q-GaLore-Adam8bit with the following scripts.
torchrun --standalone --nproc_per_node 1 run_pretrain.py \
--model_config configs/llama_130m.json \
--lr 0.015 \
--galore_scale 0.25 \
--rank 256 \
--update_proj_gap 200 \
--batch_size 256 \
--total_batch_size 512 \
--num_training_steps 20000 \
--warmup_steps 2000 \
--weight_decay 0 \
--dtype bfloat16 \
--eval_every 1000 \
--optimizer q_galore_adamw8bit \
--project 'g-galore-c4' \
--weight_quant \
--stochastic_round \
--proj_quant \
--name Q-Galore-Adam8bit-LLaMA-130M
Pretraining LLaMA-7B model within 16GB memory
The command of training LLaMA-7B model on single GPU as provided within scripts/pretrain_c4/single_gpu. With 16 batch size and activation checkpointing, the following scripts can pre-train a LLaMA-7B model with 15.26GB memory (tested on a single A6000 GPU)
```
LLaMA-7B, 8-bit Q-GaLore-Adam, single GPU
Memory cost: 15.26G, BSZ=16
torchrun --standalone --nprocpernode 1 runpretrain.py \ --modelconfig configs/llama7b.json \ --lr 0.004 \ --galorescale 0.25 \ --rank 1024 \ --updateprojgap 500 \ --batchsize 16 \ --totalbatchsize 512 \ --activationcheckpointing \ --numtrainingsteps 150000 \ --warmupsteps 15000 \ --weightdecay 0 \ --gradclipping 1.0 \ --dtype bfloat16 \ --evalevery 1000 \ --singlegpu \ --projquant \ --weightquant \ --stochasticround \ --optimizer qgaloreadamw8bitperlayer
```
Citation
bibtex
@misc{zhang2024qgalore,
title={Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients},
author={Zhenyu Zhang and Ajay Jaiswal and Lu Yin and Shiwei Liu and Jiawei Zhao and Yuandong Tian and Zhangyang Wang},
year={2024},
eprint={2407.08296},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2407.08296},
}
Owner
- Name: VITA
- Login: VITA-Group
- Kind: organization
- Website: https://vita-group.github.io
- Repositories: 75
- Profile: https://github.com/VITA-Group
Visual Informatics Group @ University of Texas at Austin
Citation (CITATION.cff)
cff-version: 1.2.0
title: "Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients"
version: 1.0.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Zhenyu"
given-names: "Zhang"
year: 2024
repository-code: "TBD"
GitHub Events
Total
- Issues event: 3
- Watch event: 35
- Fork event: 4
Last Year
- Issues event: 3
- Watch event: 35
- Fork event: 4
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 9
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 9
- Total pull request authors: 1
- Average comments per issue: 0.22
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 5
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 5
- Pull request authors: 0
- Average comments per issue: 0.2
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Khaledbouza (1)
- LLMresearcher (1)
- philschmid (1)
- 0wwafa (1)
- kostum123 (1)
- radhacr (1)
- GeraudBourdin (1)
- huu4ontocord (1)
- lucasmgomez (1)
Pull Request Authors
- Khaledbouza (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- bitsandbytes *
- datasets *
- evaluate *
- galore-torch *
- lion-pytorch *
- loguru *
- matplotlib *
- nvitop *
- peft *
- scikit-learn *
- scipy *
- tokenizers *
- torch *
- transformers ==4.31.0
- wandb *
- bitsandbytes *
- torch *
- transformers *
- _libgcc_mutex 0.1
- _openmp_mutex 5.1
- blas 1.0
- brotli-python 1.0.9
- bzip2 1.0.8
- ca-certificates 2024.3.11
- certifi 2024.2.2
- charset-normalizer 2.0.4
- cuda-cudart 12.1.105
- cuda-cupti 12.1.105
- cuda-libraries 12.1.0
- cuda-nvrtc 12.1.105
- cuda-nvtx 12.1.105
- cuda-opencl 12.4.127
- cuda-runtime 12.1.0
- ffmpeg 4.3
- freetype 2.12.1
- gmp 6.2.1
- gmpy2 2.1.2
- gnutls 3.6.15
- idna 3.7
- intel-openmp 2023.1.0
- jinja2 3.1.3
- jpeg 9e
- lame 3.100
- lcms2 2.12
- ld_impl_linux-64 2.38
- lerc 3.0
- libcublas 12.1.0.26
- libcufft 11.0.2.4
- libcufile 1.9.1.3
- libcurand 10.3.5.147
- libcusolver 11.4.4.55
- libcusparse 12.0.2.55
- libdeflate 1.17
- libffi 3.4.4
- libgcc-ng 11.2.0
- libgomp 11.2.0
- libiconv 1.16
- libidn2 2.3.4
- libjpeg-turbo 2.0.0
- libnpp 12.0.2.50
- libnvjitlink 12.1.105
- libnvjpeg 12.1.1.14
- libpng 1.6.39
- libstdcxx-ng 11.2.0
- libtasn1 4.19.0
- libtiff 4.5.1
- libunistring 0.9.10
- libwebp-base 1.3.2
- llvm-openmp 14.0.6
- lz4-c 1.9.4
- markupsafe 2.1.3
- mkl 2023.1.0
- mkl-service 2.4.0
- mkl_fft 1.3.8
- mkl_random 1.2.4
- mpc 1.1.0
- mpfr 4.0.2
- mpmath 1.3.0
- ncurses 6.4
- nettle 3.7.3
- networkx 3.1
- numpy 1.24.3
- numpy-base 1.24.3
- openh264 2.1.1
- openjpeg 2.4.0
- openssl 3.0.13
- pillow 10.3.0
- pip 24.0
- pysocks 1.7.1
- python 3.8.19
- pytorch 2.3.0
- pytorch-cuda 12.1
- pytorch-mutex 1.0
- pyyaml 6.0.1
- readline 8.2
- requests 2.31.0
- setuptools 69.5.1
- sqlite 3.45.3
- sympy 1.12
- tbb 2021.8.0
- tk 8.6.14
- torchaudio 2.3.0
- torchtriton 2.3.0
- torchvision 0.18.0
- typing_extensions 4.9.0
- urllib3 2.1.0
- wheel 0.43.0
- xz 5.4.6
- yaml 0.2.5
- zlib 1.2.13
- zstd 1.5.5