pot-llm-matmulfree

Power of two quantization LLM

https://github.com/pol-arevalo-soler/pot-llm-matmulfree

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Power of two quantization LLM

Basic Info
  • Host: GitHub
  • Owner: pol-arevalo-soler
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 3.43 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 10 months ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

PoT MatMulFree LLM

1. Overview

This repository provides a collection of efficient Triton-based implementations for state-of-the-art flash-linear attention models, as in flash-linear-attention. It extends these with a specialized PoT (Powers-of-Two) implementation.

2. PoT Implementation

The special implementation of PoT (Powers-of-Two quantization) can be found in:

src/matmulfreellm/mmfreelm/quantization.py

This module contains the core logic for quantizing weights using powers-of-two, enabling matrix multiplication-free inference in LLMs.

3. Code Base

This project is based on the following repository:

Owner

  • Login: pol-arevalo-soler
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Yang"
  given-names: "Songlin"
  orcid: "https://orcid.org/0000-0002-5944-0110"
- family-names: "Zhang"
  given-names: "Yu"
  orcid: "https://orcid.org/0000-0002-8345-3835"
title: "FLA: A Triton-Based Library for Hardware-Efficient Implementations of Linear Attention Mechanism"
version: 0.1
date-released: 2024-01-18
url: "https://github.com/fla-org/flash-linear-attention"

GitHub Events

Total
  • Push event: 1
  • Create event: 2
Last Year
  • Push event: 1
  • Create event: 2

Dependencies

.github/workflows/intel-a770.yml actions
  • actions/checkout v4 composite
  • tj-actions/changed-files v46.0.3 composite
.github/workflows/issue.yml actions
  • actions/stale v9.1.0 composite
.github/workflows/lint.yaml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • tj-actions/changed-files v46.0.3 composite
.github/workflows/nvidia-4090.yml actions
  • actions/checkout v4 composite
  • tj-actions/changed-files v46.0.3 composite
.github/workflows/nvidia-h100.yml actions
  • actions/checkout v4 composite
  • tj-actions/changed-files v46.0.3 composite
.github/workflows/release.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
.github/workflows/triton-nightly.yml actions
  • actions/checkout v4 composite
pyproject.toml pypi
  • datasets >=3.3.0
  • einops *
  • ninja *
  • torch >=2.5
  • transformers >=4.45.0
setup.py pypi
  • datasets >=3.3.0
  • einops *
  • ninja *
  • torch >=2.5
  • transformers >=4.45.0
src/matmulfreellm/setup.py pypi
  • einops *
  • ninja *
  • transformers *
  • triton *