322-omniquant-omnidirectionally-calibrated-quantization-for-large-language-models
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.7%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Basic Info
- Host: GitHub
- Owner: SZU-AdvTech-2024
- Default Branch: main
- Size: 0 Bytes
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Created over 1 year ago
· Last pushed over 1 year ago
Metadata Files
Citation
https://github.com/SZU-AdvTech-2024/322-OmniQuant-Omnidirectionally-Calibrated-Quantization-for-Large-Language-Models/blob/main/
1. Weight-only quantization ``` # W3A16 CUDA_VISIBLE_DEVICES=0 python main.py --model OmniQuant/model/Llama-2-7b --epochs 20 --output_dir ./log/llama-7b-w3a16 --eval_ppl --wbits 3 --abits 16 --lwc # W3A16g128 CUDA_VISIBLE_DEVICES=0 python main.py \ --model OmniQuant/model/Llama-2-7b \ --epochs 20 --output_dir ./log/llama-7b-w3a16g128 \ --eval_ppl --wbits 3 --abits 16 --group_size 128 --lwc ``` 2. weight-activation quantization ``` # W4A4 CUDA_VISIBLE_DEVICES=0 python main.py --model OmniQuant/model/Llama-2-7b --epochs 20 --output_dir ./log/llama-7b-w4a4 --eval_ppl --wbits 4 --abits 4 --lwc --let --tasks piqa,arc_easy,arc_challenge,boolq,hellaswag,winogrande ``` More detailed and optional arguments: - `--model`: the local model path or huggingface format. - `--wbits`: weight quantization bits. - `--abits`: activation quantization bits. - `--group_size`: group size of weight quantization. If no set, use per-channel quantization for weight as default. - `--lwc`: activate the Learnable Weight Clipping (LWC). - `--let`: activate the Learnable Equivalent Transformation (LET). - `--lwc_lr`: learning rate of LWC parameters, 1e-2 as default. - `--let_lr`: learning rate of LET parameters, 5e-3 as default. - `--epochs`: training epochs. You can set it as 0 to evaluate pre-trained OmniQuant checkpoints. - `--nsamples`: number of calibration samples, 128 as default. - `--eval_ppl`: evaluating the perplexity of quantized models. - `--tasks`: evaluating zero-shot tasks. - `--resume`: loading pre-trained OmniQuant parameters. - `--multigpu`: to inference larger network on multiple GPUs - `--real_quant`: real quantization, which can see memory reduce. Note that due to the limitations of AutoGPTQ kernels, the real quantization of weight-only quantization can only lead memory reduction, but with slower inference speed. - `--save_dir`: saving the quantization model for further exploration.
Owner
- Name: SZU-AdvTech-2024
- Login: SZU-AdvTech-2024
- Kind: organization
- Repositories: 1
- Profile: https://github.com/SZU-AdvTech-2024
GitHub Events
Total
- Push event: 2
- Create event: 3
Last Year
- Push event: 2
- Create event: 3