quantization-sparsity-interplay
This repo contains the code for studying the interplay between quantization and sparsity methods
https://github.com/parsa-epfl/quantization-sparsity-interplay
Science Score: 52.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
✓Institutional organization owner
Organization parsa-epfl has institutional domain (parsa.epfl.ch) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.4%) to scientific vocabulary
Repository
This repo contains the code for studying the interplay between quantization and sparsity methods
Basic Info
- Host: GitHub
- Owner: parsa-epfl
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 38 MB
Statistics
- Stars: 16
- Watchers: 4
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Effective Interplay between Sparsity and Quantization: From Theory to Practice
This repository is the official implementation of the code used for all analysis and experiments in the paper: Effective Interplay between Sparsity and Quantization: From Theory to Practice.
The paper mathematically investigates the relationship between quantization and sparsity techniques, and how their errors combine when both techniques are used together. The theoretical analysis is validated by experimental results on a wide range of models.
About Our Work
Various forms of quantization and sparsity techniques have emerged as promising approaches to compress models, especially in the modern era of LLMs. This paper focuses on the combined application of both of these techniques, and is part of the broader research efforts to make the memory footprint of LLMs smaller, and make them more accessible. Our mathematical analysis and extensive empirical study with large language models (OPT, LLaMA) and vision transformers (ViT) demonstrate that quantization and sparsity are not orthogonal and their combined use can adversely affect model accuracy. Our findings provide valuable insights for optimizing the compression of large models while preserving accuracy.
To setup the environment, please run:
console
pip install -r requirements_pip.txt
Scripts to run LLaMA, OPT and ViT experiments are provided.
Access scripts for LLaMA and OPT in the following directory:
console
cd ./examples/pytorch/language-modeling/quantization_sparsity_scripts/
Access scripts for ViT in the following directory:
console
cd ./examples/pytorch/image-classification/
Note: to run image classification experiments, get access to ImageNet dataset on Hugging Face and add your access token to your .bashrc file:
console
export HUGGINGFACE_TOKEN=<your_access_token>
Citation
If you find the analysis and experimental results useful for your own research, please cite our paper:
angular2html
@article{quant-sparse-interplay:2024,
title = {{Effective Interplay between Sparsity and Quantization:
From Theory to Practice}},
author = {Harma, Simla Burcu and Chakraborty, Ayan and Kostenok, Elizaveta and Mishin, Danila and Ha, Dongho and Falsafi, Babak and Jaggi, Martin and Liu, Ming and Oh, Yunho and Subramanian, Suvinay and Yazdanbakhsh, Amir},
year = 2024,
journal = {arXiv preprint}
}
Owner
- Name: PARSA @ EPFL
- Login: parsa-epfl
- Kind: organization
- Location: Lausanne, Switzerland
- Website: http://parsa.epfl.ch
- Twitter: parsa_epfl
- Repositories: 11
- Profile: https://github.com/parsa-epfl
Citation (CITATION.cff)
cff-version: "1.2.0"
date-released: 2020-10
message: "If you use this software, please cite it using these metadata."
title: "Transformers: State-of-the-Art Natural Language Processing"
url: "https://github.com/huggingface/transformers"
authors:
- family-names: Wolf
given-names: Thomas
- family-names: Debut
given-names: Lysandre
- family-names: Sanh
given-names: Victor
- family-names: Chaumond
given-names: Julien
- family-names: Delangue
given-names: Clement
- family-names: Moi
given-names: Anthony
- family-names: Cistac
given-names: Perric
- family-names: Ma
given-names: Clara
- family-names: Jernite
given-names: Yacine
- family-names: Plu
given-names: Julien
- family-names: Xu
given-names: Canwen
- family-names: "Le Scao"
given-names: Teven
- family-names: Gugger
given-names: Sylvain
- family-names: Drame
given-names: Mariama
- family-names: Lhoest
given-names: Quentin
- family-names: Rush
given-names: "Alexander M."
preferred-citation:
type: conference-paper
authors:
- family-names: Wolf
given-names: Thomas
- family-names: Debut
given-names: Lysandre
- family-names: Sanh
given-names: Victor
- family-names: Chaumond
given-names: Julien
- family-names: Delangue
given-names: Clement
- family-names: Moi
given-names: Anthony
- family-names: Cistac
given-names: Perric
- family-names: Ma
given-names: Clara
- family-names: Jernite
given-names: Yacine
- family-names: Plu
given-names: Julien
- family-names: Xu
given-names: Canwen
- family-names: "Le Scao"
given-names: Teven
- family-names: Gugger
given-names: Sylvain
- family-names: Drame
given-names: Mariama
- family-names: Lhoest
given-names: Quentin
- family-names: Rush
given-names: "Alexander M."
booktitle: "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations"
month: 10
start: 38
end: 45
title: "Transformers: State-of-the-Art Natural Language Processing"
year: 2020
publisher: "Association for Computational Linguistics"
url: "https://www.aclweb.org/anthology/2020.emnlp-demos.6"
address: "Online"
GitHub Events
Total
- Watch event: 15
- Delete event: 1
- Push event: 2
- Fork event: 2
- Create event: 2
Last Year
- Watch event: 15
- Delete event: 1
- Push event: 2
- Fork event: 2
- Create event: 2