quantization-sparsity-interplay

This repo contains the code for studying the interplay between quantization and sparsity methods

https://github.com/parsa-epfl/quantization-sparsity-interplay

Science Score: 52.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
    Organization parsa-epfl has institutional domain (parsa.epfl.ch)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

This repo contains the code for studying the interplay between quantization and sparsity methods

Basic Info
  • Host: GitHub
  • Owner: parsa-epfl
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 38 MB
Statistics
  • Stars: 16
  • Watchers: 4
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

Effective Interplay between Sparsity and Quantization: From Theory to Practice

This repository is the official implementation of the code used for all analysis and experiments in the paper: Effective Interplay between Sparsity and Quantization: From Theory to Practice.

The paper mathematically investigates the relationship between quantization and sparsity techniques, and how their errors combine when both techniques are used together. The theoretical analysis is validated by experimental results on a wide range of models.

About Our Work

Various forms of quantization and sparsity techniques have emerged as promising approaches to compress models, especially in the modern era of LLMs. This paper focuses on the combined application of both of these techniques, and is part of the broader research efforts to make the memory footprint of LLMs smaller, and make them more accessible. Our mathematical analysis and extensive empirical study with large language models (OPT, LLaMA) and vision transformers (ViT) demonstrate that quantization and sparsity are not orthogonal and their combined use can adversely affect model accuracy. Our findings provide valuable insights for optimizing the compression of large models while preserving accuracy.

To setup the environment, please run: console pip install -r requirements_pip.txt

Scripts to run LLaMA, OPT and ViT experiments are provided.

Access scripts for LLaMA and OPT in the following directory: console cd ./examples/pytorch/language-modeling/quantization_sparsity_scripts/

Access scripts for ViT in the following directory: console cd ./examples/pytorch/image-classification/ Note: to run image classification experiments, get access to ImageNet dataset on Hugging Face and add your access token to your .bashrc file: console export HUGGINGFACE_TOKEN=<your_access_token>

Citation

If you find the analysis and experimental results useful for your own research, please cite our paper: angular2html @article{quant-sparse-interplay:2024, title = {{Effective Interplay between Sparsity and Quantization: From Theory to Practice}}, author = {Harma, Simla Burcu and Chakraborty, Ayan and Kostenok, Elizaveta and Mishin, Danila and Ha, Dongho and Falsafi, Babak and Jaggi, Martin and Liu, Ming and Oh, Yunho and Subramanian, Suvinay and Yazdanbakhsh, Amir}, year = 2024, journal = {arXiv preprint} }

Owner

  • Name: PARSA @ EPFL
  • Login: parsa-epfl
  • Kind: organization
  • Location: Lausanne, Switzerland

Citation (CITATION.cff)

cff-version: "1.2.0"
date-released: 2020-10
message: "If you use this software, please cite it using these metadata."
title: "Transformers: State-of-the-Art Natural Language Processing"
url: "https://github.com/huggingface/transformers"
authors: 
  - family-names: Wolf
    given-names: Thomas
  - family-names: Debut
    given-names: Lysandre
  - family-names: Sanh
    given-names: Victor
  - family-names: Chaumond
    given-names: Julien
  - family-names: Delangue
    given-names: Clement
  - family-names: Moi
    given-names: Anthony
  - family-names: Cistac
    given-names: Perric
  - family-names: Ma
    given-names: Clara
  - family-names: Jernite
    given-names: Yacine
  - family-names: Plu
    given-names: Julien
  - family-names: Xu
    given-names: Canwen
  - family-names: "Le Scao"
    given-names: Teven
  - family-names: Gugger
    given-names: Sylvain
  - family-names: Drame
    given-names: Mariama
  - family-names: Lhoest
    given-names: Quentin
  - family-names: Rush
    given-names: "Alexander M."
preferred-citation:
  type: conference-paper
  authors:
  - family-names: Wolf
    given-names: Thomas
  - family-names: Debut
    given-names: Lysandre
  - family-names: Sanh
    given-names: Victor
  - family-names: Chaumond
    given-names: Julien
  - family-names: Delangue
    given-names: Clement
  - family-names: Moi
    given-names: Anthony
  - family-names: Cistac
    given-names: Perric
  - family-names: Ma
    given-names: Clara
  - family-names: Jernite
    given-names: Yacine
  - family-names: Plu
    given-names: Julien
  - family-names: Xu
    given-names: Canwen
  - family-names: "Le Scao"
    given-names: Teven
  - family-names: Gugger
    given-names: Sylvain
  - family-names: Drame
    given-names: Mariama
  - family-names: Lhoest
    given-names: Quentin
  - family-names: Rush
    given-names: "Alexander M."
  booktitle: "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations"
  month: 10
  start: 38
  end: 45
  title: "Transformers: State-of-the-Art Natural Language Processing"
  year: 2020
  publisher: "Association for Computational Linguistics"
  url: "https://www.aclweb.org/anthology/2020.emnlp-demos.6"
  address: "Online"

GitHub Events

Total
  • Watch event: 15
  • Delete event: 1
  • Push event: 2
  • Fork event: 2
  • Create event: 2
Last Year
  • Watch event: 15
  • Delete event: 1
  • Push event: 2
  • Fork event: 2
  • Create event: 2