322-omniquant-omnidirectionally-calibrated-quantization-for-large-language-models

https://github.com/szu-advtech-2024/322-omniquant-omnidirectionally-calibrated-quantization-for-large-language-models

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (5.7%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: SZU-AdvTech-2024
Default Branch: main
Size: 0 Bytes

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Citation

https://github.com/SZU-AdvTech-2024/322-OmniQuant-Omnidirectionally-Calibrated-Quantization-for-Large-Language-Models/blob/main/

1. Weight-only quantization
```
# W3A16
CUDA_VISIBLE_DEVICES=0 python main.py --model OmniQuant/model/Llama-2-7b --epochs 20 --output_dir ./log/llama-7b-w3a16 --eval_ppl --wbits 3 --abits 16 --lwc

# W3A16g128
CUDA_VISIBLE_DEVICES=0 python main.py \
--model OmniQuant/model/Llama-2-7b  \
--epochs 20 --output_dir ./log/llama-7b-w3a16g128 \
--eval_ppl --wbits 3 --abits 16 --group_size 128 --lwc
```

2. weight-activation quantization
```
# W4A4
CUDA_VISIBLE_DEVICES=0 python main.py --model OmniQuant/model/Llama-2-7b --epochs 20 --output_dir ./log/llama-7b-w4a4 --eval_ppl --wbits 4 --abits 4 --lwc --let 
--tasks piqa,arc_easy,arc_challenge,boolq,hellaswag,winogrande
```

More detailed and optional arguments:
- `--model`: the local model path or huggingface format.
- `--wbits`: weight quantization bits.
- `--abits`: activation quantization bits.
- `--group_size`: group size of weight quantization. If no set, use per-channel quantization for weight as default.
- `--lwc`: activate the Learnable Weight Clipping (LWC).
- `--let`: activate the Learnable Equivalent Transformation (LET).
- `--lwc_lr`: learning rate of LWC parameters, 1e-2 as default.
- `--let_lr`: learning rate of LET parameters, 5e-3 as default.
- `--epochs`: training epochs. You can set it as 0 to evaluate pre-trained OmniQuant checkpoints.
- `--nsamples`: number of calibration samples, 128 as default.
- `--eval_ppl`: evaluating the perplexity of quantized models.
- `--tasks`: evaluating zero-shot tasks.
- `--resume`: loading pre-trained OmniQuant parameters.
- `--multigpu`: to inference larger network on multiple GPUs
- `--real_quant`: real quantization, which can see memory reduce. Note that due to the limitations of AutoGPTQ kernels, the real quantization of weight-only quantization can only lead memory reduction, but with slower inference speed.
- `--save_dir`: saving the quantization model for further exploration.

Owner

Name: SZU-AdvTech-2024
Login: SZU-AdvTech-2024
Kind: organization

Repositories: 1
Profile: https://github.com/SZU-AdvTech-2024

GitHub Events

Total

Push event: 2
Create event: 3

Last Year

Push event: 2
Create event: 3

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science