https://github.com/cedrickchee/awesome-ml-model-compression

Awesome machine learning model compression research papers, quantization, tools, and learning material.

https://github.com/cedrickchee/awesome-ml-model-compression

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    1 of 5 committers (20.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.8%) to scientific vocabulary

Keywords

awesome-list machine-learning model-compression neural-networks pruning quantization
Last synced: 5 months ago · JSON representation

Repository

Awesome machine learning model compression research papers, quantization, tools, and learning material.

Basic Info
  • Host: GitHub
  • Owner: cedrickchee
  • License: mit
  • Default Branch: master
  • Homepage:
  • Size: 213 KB
Statistics
  • Stars: 533
  • Watchers: 21
  • Forks: 61
  • Open Issues: 2
  • Releases: 0
Topics
awesome-list machine-learning model-compression neural-networks pruning quantization
Created about 7 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

Awesome ML Model Compression Awesome

An awesome style list that curates the best machine learning model compression and acceleration research papers, articles, tutorials, libraries, tools and more. PRs are welcome!

Contents


Papers

General

Architecture

Quantization

Binarization

Pruning

Distillation

Low Rank Approximation

Offloading

Recent years have witnessed the emergence of systems that are specialized for LLM inference, such as FasterTransformer (NVIDIA, 2022), PaLM inference (Pope et al., 2022), Deepspeed-Inference (Aminabadi et al., 2022), Accelerate (HuggingFace, 2022), LightSeq (Wang et al., 2021), TurboTransformers (Fang et al., 2021).

To enable LLM inference on easily accessible hardware, offloading is an essential technique — to our knowledge, among current systems, only Deepspeed-Inference and Huggingface Accelerate include such functionality.

Parallelism

Compression methods for model acceleration (i.e., model parallelism) papers:

  • Does compressing activations help model parallel training? (2023) - They presents the first empirical study on the effectiveness of compression algorithms (pruning-based, learning-based, and quantization-based - using a Transformer architecture) to improve the communication speed of model parallelism. Summary: 1) activation compression not equal to gradient compression; 2) training setups matter a lot; 3) don't compress early layers' activation.

Articles

Content published on the Web.

Howtos

Assorted

Reference

Blogs

Tools

Libraries

  • TensorFlow Model Optimization Toolkit. Accompanied blog post, TensorFlow Model Optimization Toolkit — Pruning API
  • XNNPACK is a highly optimized library of floating-point neural network inference operators for ARM, WebAssembly, and x86 (SSE2 level) platforms. It's a based on QNNPACK library. However, unlike QNNPACK, XNNPACK focuses entirely on floating-point operators.
  • Bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers and quantization functions.
  • NNCP - An experiment to build a practical lossless data compressor with neural networks. The latest version uses a Transformer model (slower but best ratio). LSTM (faster) is also available.

Frameworks

Paper Implementations

  • facebookresearch/kill-the-bits - code and compressed models for the paper, "And the bit goes down: Revisiting the quantization of neural networks" by Facebook AI Research.

Videos

Talks

Training & tutorials

License

I am providing code and resources in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code and resources is from me and not my employer.

Owner

  • Name: Cedric Chee
  • Login: cedrickchee
  • Kind: user
  • Location: PID 1
  • Company: InvictusByte

Lead Software Engineer | LLMs | full stack Go/JS dev, backend | product dev @ startups | 🧑‍🎓 CompSci | alumni: fast.ai, Antler.co

GitHub Events

Total
  • Watch event: 49
  • Pull request event: 2
  • Fork event: 2
Last Year
  • Watch event: 49
  • Pull request event: 2
  • Fork event: 2

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 37
  • Total Committers: 5
  • Avg Commits per committer: 7.4
  • Development Distribution Score (DDS): 0.108
Past Year
  • Commits: 3
  • Committers: 1
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Cedric Chee c****e 33
Ikko Eltociear Ashimine e****r@g****m 1
Guangxuan Xiao x****x@m****u 1
Gaurav Menghani g****i@g****m 1
Dachuan Shi s****7@g****m 1
Committer Domains (Top 20 + Academic)
mit.edu: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 0
  • Total pull requests: 6
  • Average time to close issues: N/A
  • Average time to close pull requests: 12 days
  • Total issue authors: 0
  • Total pull request authors: 5
  • Average comments per issue: 0
  • Average comments per pull request: 1.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • sdc17 (2)
  • Shwai-He (2)
  • reddragon (1)
  • eltociear (1)
  • Guangxuan-Xiao (1)
Top Labels
Issue Labels
Pull Request Labels