pot-llm-matmulfree
Power of two quantization LLM
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.1%) to scientific vocabulary
Last synced: 6 months ago
·
JSON representation
·
Repository
Power of two quantization LLM
Basic Info
- Host: GitHub
- Owner: pol-arevalo-soler
- License: mit
- Language: Python
- Default Branch: main
- Size: 3.43 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Created 10 months ago
· Last pushed 10 months ago
Metadata Files
Readme
License
Citation
README.md
PoT MatMulFree LLM
1. Overview
This repository provides a collection of efficient Triton-based implementations for state-of-the-art flash-linear attention models, as in flash-linear-attention. It extends these with a specialized PoT (Powers-of-Two) implementation.
2. PoT Implementation
The special implementation of PoT (Powers-of-Two quantization) can be found in:
src/matmulfreellm/mmfreelm/quantization.py
This module contains the core logic for quantizing weights using powers-of-two, enabling matrix multiplication-free inference in LLMs.
3. Code Base
This project is based on the following repository:
Owner
- Login: pol-arevalo-soler
- Kind: user
- Repositories: 1
- Profile: https://github.com/pol-arevalo-soler
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Yang" given-names: "Songlin" orcid: "https://orcid.org/0000-0002-5944-0110" - family-names: "Zhang" given-names: "Yu" orcid: "https://orcid.org/0000-0002-8345-3835" title: "FLA: A Triton-Based Library for Hardware-Efficient Implementations of Linear Attention Mechanism" version: 0.1 date-released: 2024-01-18 url: "https://github.com/fla-org/flash-linear-attention"
GitHub Events
Total
- Push event: 1
- Create event: 2
Last Year
- Push event: 1
- Create event: 2
Dependencies
.github/workflows/intel-a770.yml
actions
- actions/checkout v4 composite
- tj-actions/changed-files v46.0.3 composite
.github/workflows/issue.yml
actions
- actions/stale v9.1.0 composite
.github/workflows/lint.yaml
actions
- actions/checkout v4 composite
- actions/setup-python v5 composite
- tj-actions/changed-files v46.0.3 composite
.github/workflows/nvidia-4090.yml
actions
- actions/checkout v4 composite
- tj-actions/changed-files v46.0.3 composite
.github/workflows/nvidia-h100.yml
actions
- actions/checkout v4 composite
- tj-actions/changed-files v46.0.3 composite
.github/workflows/release.yml
actions
- actions/checkout v4 composite
- actions/setup-python v5 composite
.github/workflows/triton-nightly.yml
actions
- actions/checkout v4 composite
pyproject.toml
pypi
- datasets >=3.3.0
- einops *
- ninja *
- torch >=2.5
- transformers >=4.45.0
setup.py
pypi
- datasets >=3.3.0
- einops *
- ninja *
- torch >=2.5
- transformers >=4.45.0
src/matmulfreellm/setup.py
pypi
- einops *
- ninja *
- transformers *
- triton *