q-galore

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.

https://github.com/vita-group/q-galore

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.5%) to scientific vocabulary

Keywords

large-language-models low-rank memory-efficient-learning quantization
Last synced: 4 months ago · JSON representation ·

Repository

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.

Basic Info
  • Host: GitHub
  • Owner: VITA-Group
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 343 KB
Statistics
  • Stars: 200
  • Watchers: 9
  • Forks: 17
  • Open Issues: 10
  • Releases: 0
Topics
large-language-models low-rank memory-efficient-learning quantization
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Q-GaLore

This repo contains the pre-release version of Q-GaLore algorithm, proposed by Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.

Q-GaLore is a memory-efficient training methodology effective in both pre-training and fine-tuning scenarios. Q-GaLore incorporates two main components: (i) low precision training utilizing low-rank gradients, and (ii) lazy layer-wise subspace exploration. It enables full-parameter learning while requiring less memory, such as training a LLaMA-7B model from scratch on a single NVIDIA RTX 4060 Ti with only 16GB of memory.

Image 2

Read this blog for more details!

Install Q-GaLore optimizer

Install via conda

conda env create -f environment.yml

or Install Q-GaLore optimizer and experiment dependencies

```bash

install from pip

pip install q-galore-torch

or install from source:

git clone https://github.com/VITA-Group/Q-GaLore.git cd Q-GaLore pip install -e .

pip install -r exp_requirements.txt ```

Usage

Pretraining LLaMA model on C4 dataset

We provide the command in scripts/pretrain_c4 for pretraining LLaMA model with sizes from 60M to 7B on C4 dataset. We also provide the simulation mode implementation of quantization with scripts in scripts/pretrain_c4/simulation. For example, training a LLaMA-60M with Q-GaLore-Adam8bit with the following scripts.

torchrun --standalone --nproc_per_node 1 run_pretrain.py \ --model_config configs/llama_130m.json \ --lr 0.015 \ --galore_scale 0.25 \ --rank 256 \ --update_proj_gap 200 \ --batch_size 256 \ --total_batch_size 512 \ --num_training_steps 20000 \ --warmup_steps 2000 \ --weight_decay 0 \ --dtype bfloat16 \ --eval_every 1000 \ --optimizer q_galore_adamw8bit \ --project 'g-galore-c4' \ --weight_quant \ --stochastic_round \ --proj_quant \ --name Q-Galore-Adam8bit-LLaMA-130M

Pretraining LLaMA-7B model within 16GB memory

The command of training LLaMA-7B model on single GPU as provided within scripts/pretrain_c4/single_gpu. With 16 batch size and activation checkpointing, the following scripts can pre-train a LLaMA-7B model with 15.26GB memory (tested on a single A6000 GPU)

```

LLaMA-7B, 8-bit Q-GaLore-Adam, single GPU

Memory cost: 15.26G, BSZ=16

torchrun --standalone --nprocpernode 1 runpretrain.py \ --modelconfig configs/llama7b.json \ --lr 0.004 \ --galorescale 0.25 \ --rank 1024 \ --updateprojgap 500 \ --batchsize 16 \ --totalbatchsize 512 \ --activationcheckpointing \ --numtrainingsteps 150000 \ --warmupsteps 15000 \ --weightdecay 0 \ --gradclipping 1.0 \ --dtype bfloat16 \ --evalevery 1000 \ --singlegpu \ --projquant \ --weightquant \ --stochasticround \ --optimizer qgaloreadamw8bitperlayer

```

Citation

bibtex @misc{zhang2024qgalore, title={Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients}, author={Zhenyu Zhang and Ajay Jaiswal and Lu Yin and Shiwei Liu and Jiawei Zhao and Yuandong Tian and Zhangyang Wang}, year={2024}, eprint={2407.08296}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2407.08296}, }

Owner

  • Name: VITA
  • Login: VITA-Group
  • Kind: organization

Visual Informatics Group @ University of Texas at Austin

Citation (CITATION.cff)

cff-version: 1.2.0
title: "Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients"
version: 1.0.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: "Zhenyu"
    given-names: "Zhang"
year: 2024
repository-code: "TBD"

GitHub Events

Total
  • Issues event: 3
  • Watch event: 35
  • Fork event: 4
Last Year
  • Issues event: 3
  • Watch event: 35
  • Fork event: 4

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 9
  • Total Committers: 1
  • Avg Commits per committer: 9.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 9
  • Committers: 1
  • Avg Commits per committer: 9.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Allen 3****n 9

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 9
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 9
  • Total pull request authors: 1
  • Average comments per issue: 0.22
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 5
  • Pull request authors: 0
  • Average comments per issue: 0.2
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Khaledbouza (1)
  • LLMresearcher (1)
  • philschmid (1)
  • 0wwafa (1)
  • kostum123 (1)
  • radhacr (1)
  • GeraudBourdin (1)
  • huu4ontocord (1)
  • lucasmgomez (1)
Pull Request Authors
  • Khaledbouza (2)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

exp_requirements.txt pypi
  • bitsandbytes *
  • datasets *
  • evaluate *
  • galore-torch *
  • lion-pytorch *
  • loguru *
  • matplotlib *
  • nvitop *
  • peft *
  • scikit-learn *
  • scipy *
  • tokenizers *
  • torch *
  • transformers ==4.31.0
  • wandb *
q_galore_torch/utils/setup.py pypi
requirements.txt pypi
  • bitsandbytes *
  • torch *
  • transformers *
setup.py pypi
environment.yml conda
  • _libgcc_mutex 0.1
  • _openmp_mutex 5.1
  • blas 1.0
  • brotli-python 1.0.9
  • bzip2 1.0.8
  • ca-certificates 2024.3.11
  • certifi 2024.2.2
  • charset-normalizer 2.0.4
  • cuda-cudart 12.1.105
  • cuda-cupti 12.1.105
  • cuda-libraries 12.1.0
  • cuda-nvrtc 12.1.105
  • cuda-nvtx 12.1.105
  • cuda-opencl 12.4.127
  • cuda-runtime 12.1.0
  • ffmpeg 4.3
  • freetype 2.12.1
  • gmp 6.2.1
  • gmpy2 2.1.2
  • gnutls 3.6.15
  • idna 3.7
  • intel-openmp 2023.1.0
  • jinja2 3.1.3
  • jpeg 9e
  • lame 3.100
  • lcms2 2.12
  • ld_impl_linux-64 2.38
  • lerc 3.0
  • libcublas 12.1.0.26
  • libcufft 11.0.2.4
  • libcufile 1.9.1.3
  • libcurand 10.3.5.147
  • libcusolver 11.4.4.55
  • libcusparse 12.0.2.55
  • libdeflate 1.17
  • libffi 3.4.4
  • libgcc-ng 11.2.0
  • libgomp 11.2.0
  • libiconv 1.16
  • libidn2 2.3.4
  • libjpeg-turbo 2.0.0
  • libnpp 12.0.2.50
  • libnvjitlink 12.1.105
  • libnvjpeg 12.1.1.14
  • libpng 1.6.39
  • libstdcxx-ng 11.2.0
  • libtasn1 4.19.0
  • libtiff 4.5.1
  • libunistring 0.9.10
  • libwebp-base 1.3.2
  • llvm-openmp 14.0.6
  • lz4-c 1.9.4
  • markupsafe 2.1.3
  • mkl 2023.1.0
  • mkl-service 2.4.0
  • mkl_fft 1.3.8
  • mkl_random 1.2.4
  • mpc 1.1.0
  • mpfr 4.0.2
  • mpmath 1.3.0
  • ncurses 6.4
  • nettle 3.7.3
  • networkx 3.1
  • numpy 1.24.3
  • numpy-base 1.24.3
  • openh264 2.1.1
  • openjpeg 2.4.0
  • openssl 3.0.13
  • pillow 10.3.0
  • pip 24.0
  • pysocks 1.7.1
  • python 3.8.19
  • pytorch 2.3.0
  • pytorch-cuda 12.1
  • pytorch-mutex 1.0
  • pyyaml 6.0.1
  • readline 8.2
  • requests 2.31.0
  • setuptools 69.5.1
  • sqlite 3.45.3
  • sympy 1.12
  • tbb 2021.8.0
  • tk 8.6.14
  • torchaudio 2.3.0
  • torchtriton 2.3.0
  • torchvision 0.18.0
  • typing_extensions 4.9.0
  • urllib3 2.1.0
  • wheel 0.43.0
  • xz 5.4.6
  • yaml 0.2.5
  • zlib 1.2.13
  • zstd 1.5.5