pot-llm-matmulfree

Power of two quantization LLM

https://github.com/pol-arevalo-soler/pot-llm-matmulfree

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (5.1%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Power of two quantization LLM

Basic Info

Host: GitHub
Owner: pol-arevalo-soler
License: mit
Language: Python
Default Branch: main
Size: 3.43 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created 10 months ago · Last pushed 10 months ago

Metadata Files

Readme License Citation

PoT MatMulFree LLM

1. Overview

This repository provides a collection of efficient Triton-based implementations for state-of-the-art flash-linear attention models, as in flash-linear-attention. It extends these with a specialized PoT (Powers-of-Two) implementation.

2. PoT Implementation

The special implementation of PoT (Powers-of-Two quantization) can be found in:

src/matmulfreellm/mmfreelm/quantization.py

This module contains the core logic for quantizing weights using powers-of-two, enabling matrix multiplication-free inference in LLMs.

3. Code Base

This project is based on the following repository:

flash-linear-attention

Owner

Login: pol-arevalo-soler
Kind: user

Repositories: 1
Profile: https://github.com/pol-arevalo-soler

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Yang"
  given-names: "Songlin"
  orcid: "https://orcid.org/0000-0002-5944-0110"
- family-names: "Zhang"
  given-names: "Yu"
  orcid: "https://orcid.org/0000-0002-8345-3835"
title: "FLA: A Triton-Based Library for Hardware-Efficient Implementations of Linear Attention Mechanism"
version: 0.1
date-released: 2024-01-18
url: "https://github.com/fla-org/flash-linear-attention"

GitHub Events

Total

Push event: 1
Create event: 2

Last Year

Push event: 1
Create event: 2

Dependencies

.github/workflows/intel-a770.yml actions

actions/checkout v4 composite
tj-actions/changed-files v46.0.3 composite

.github/workflows/issue.yml actions

actions/stale v9.1.0 composite

.github/workflows/lint.yaml actions

actions/checkout v4 composite
actions/setup-python v5 composite
tj-actions/changed-files v46.0.3 composite

.github/workflows/nvidia-4090.yml actions

actions/checkout v4 composite
tj-actions/changed-files v46.0.3 composite

.github/workflows/nvidia-h100.yml actions

actions/checkout v4 composite
tj-actions/changed-files v46.0.3 composite

.github/workflows/release.yml actions

actions/checkout v4 composite
actions/setup-python v5 composite

.github/workflows/triton-nightly.yml actions

actions/checkout v4 composite

pyproject.toml pypi

datasets >=3.3.0
einops *
ninja *
torch >=2.5
transformers >=4.45.0

setup.py pypi

datasets >=3.3.0
einops *
ninja *
torch >=2.5
transformers >=4.45.0

src/matmulfreellm/setup.py pypi

einops *
ninja *
transformers *
triton *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science