halutmatmul

Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator

https://github.com/joennlae/halutmatmul

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.7%) to scientific vocabulary

Keywords

approximate-inference hardware hardware-acceleration machine-learning maddness pytorch

Last synced: 6 months ago · JSON representation

Repository

Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator

Basic Info

Host: GitHub
Owner: joennlae
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 3.09 MB

Statistics

Stars: 210
Watchers: 9
Forks: 13
Open Issues: 2
Releases: 0

Topics

approximate-inference hardware hardware-acceleration machine-learning maddness pytorch

Created almost 4 years ago · Last pushed about 2 years ago

Metadata Files

Readme Changelog License Citation

README.md

# Stella Nera: A halutmatmul based Accelerator

### Algorithmic CI [![PyTorch Layer Test | PyTest](https://github.com/joennlae/halutmatmul/actions/workflows/python_testing.yaml/badge.svg)](https://github.com/joennlae/halutmatmul/actions/workflows/python_testing.yaml) [![Python Linting](https://github.com/joennlae/halutmatmul/actions/workflows/linting.yaml/badge.svg)](https://github.com/joennlae/halutmatmul/actions/workflows/linting.yaml) [![Mypy - Typechecking](https://github.com/joennlae/halutmatmul/actions/workflows/python_typing.yaml/badge.svg)](https://github.com/joennlae/halutmatmul/actions/workflows/python_typing.yaml) ### ML CI [![ResNet9 - 92%+ accuracy](https://github.com/joennlae/halutmatmul/actions/workflows/resnet9_validation.yaml/badge.svg)](https://github.com/joennlae/halutmatmul/actions/workflows/resnet9_validation.yaml) ### Hardware CI [![HW Synth + PAR OpenROAD](https://github.com/joennlae/halutmatmul/actions/workflows/hw_openroad.yaml/badge.svg)](https://github.com/joennlae/halutmatmul/actions/workflows/hw_openroad.yaml) [![RTL Linting](https://github.com/joennlae/halutmatmul/actions/workflows/hw_linting.yaml/badge.svg)](https://github.com/joennlae/halutmatmul/actions/workflows/hw_linting.yaml) [![HW Design Verification](https://github.com/joennlae/halutmatmul/actions/workflows/hw_dv.yaml/badge.svg)](https://github.com/joennlae/halutmatmul/actions/workflows/hw_dv.yaml)

Paper

Stella Nera: Achieving 161 TOp/s/W with Multiplier-free DNN Acceleration based on Approximate Matrix Multiplication

Abstract

The recent Maddness method approximates Matrix Multiplication (MatMul) without the need for multiplication by using a hash-based version of product quantization (PQ). The hash function is a decision tree, allowing for efficient hardware implementation, as multiply-accumulate operations are replaced by decision tree passes and LUT lookups. Stella Nera is the first Maddness accelerator achieving 15x higher area efficiency (GMAC/s/mm^2) and 25x higher energy efficiency (TMAC/s/W) than direct MatMul accelerators in the same technology. In a commercial 14 nm technology and scaled to 3 nm, we achieve an energy efficiency of 161 TOp/s/W@0.55V with a Top-1 accuracy on CIFAR-10 of over 92.5% using ResNet9.

Algorithmic - Maddness

Maddness Animation

ResNet-9 LUTs, Thresholds, Dims

Download 92%+ Model

Halutmatmul example

example.py

```python import numpy as np from halutmatmul.halutmatmul import HalutMatmul

A = np.random.random((10000, 512)) Atrain = A[:8000] Atest = A[8000:] B = np.random.random((512, 10)) C = np.matmul(A_test, B)

hm = HalutMatmul(C=32, K=16) hm.learnoffline(Atrain, B) Chalut = hm.matmulonline(A_test)

mse = np.square(C_halut - C).mean() print(mse) ```

Installation

```bash

install conda environment & activate

mamba is recommended for faster install

conda env create -f environment_gpu.yml conda activate halutmatmul

IIS prefixed env

conda env create -f environmentgpu.yml --prefix /scratch/janniss/conda/halutmatmulgpu ```

Differentiable Maddness

Hardware - OpenROAD flow results from CI - NOT OPTIMIZED

All completely open hardware results are NOT OPTIMIZED! The results are only for reference and to show the flow works. In the paper results from commercial tools are shown. See this as a community service to make the hardware results more accessible.

| All Designs | NanGate45 | | ------------- | ------------- | | All Report | All | | History | History |

Open Hardware Results Table

| NanGate45 | halutmatmul | halutencoder4 | halutdecoder | | ------------- | ------------- | ------------- | ------------- | | Area [μm^2] | 128816 | 46782 | 24667.5 | | Freq [Mhz] | 166.7 | 166.7 | 166.7 | | GE | 161.423 kGE | 58.624 kGE | 30.911 kGE | | Std Cell [#] | 65496 | 23130 | 12256 | | Voltage [V] | 1.1 | 1.1 | 1.1 | | Util [%] | 50.4 | 48.7 | 52.1 | | TNS | 0 | 0 | 0 | | Clock Net | | | | | Routing | Routing | Routing | Routing | | GDS | GDS Download | GDS Download | GDS Download |

Full design (halutmatmul)

Run locally with: bash git submodule update --init --recursive cd hardware ACC_TYPE=INT DATA_WIDTH=8 NUM_M=8 NUM_DECODER_UNITS=4 NUM_C=16 make halut-open-synth-and-pnr-halut_matmul

References

arXiv Maddness paper
Based on MADDness/Bolt.

Citation

bibtex @article{schonleber2023stella, title={Stella Nera: Achieving 161 TOp/s/W with Multiplier-free DNN Acceleration based on Approximate Matrix Multiplication}, author={Sch{\"o}nleber, Jannis and Cavigelli, Lukas and Andri, Renzo and Perotti, Matteo and Benini, Luca}, journal={arXiv preprint arXiv:2311.10207}, year={2023} }

Owner

Name: Jannis Schönleber
Login: joennlae
Kind: user
Location: Zürich, Switzerland
Company: ETH Zürich

Website: www.linkedin.com/in/jannis-schönleber
Twitter: joennlae
Repositories: 14
Profile: https://github.com/joennlae

joennlae = jönnlä (swiss german)

GitHub Events

Total

Watch event: 7
Fork event: 1

Last Year

Watch event: 7
Fork event: 1

Dependencies

.github/workflows/filter.yaml actions

actions/checkout v2 composite
dorny/paths-filter v2 composite

.github/workflows/hw_dv.yaml actions

actions/checkout v2 composite

.github/workflows/hw_linting.yaml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/hw_openroad.yaml actions

actions/checkout v2 composite
actions/setup-python v3 composite
actions/upload-artifact v3 composite
cpina/github-action-push-to-another-repository main composite
nanzm/get-time-action v1.1 composite
webfactory/ssh-agent v0.5.4 composite
webiny/action-post-run 2.0.1 composite

.github/workflows/linting.yaml actions

actions/checkout v2 composite
actions/setup-python v2 composite
jidicula/clang-format-action v4.6.2 composite

.github/workflows/python_testing.yaml actions

actions/checkout v2 composite

Dockerfile docker

continuumio/miniconda3 latest build
ubuntu 16.04 build

hardware/Dockerfile docker

continuumio/miniconda3 latest build
ubuntu 20.04 build

.github/workflows/python_typing.yaml actions

actions/checkout v3 composite

.github/workflows/resnet9_validation.yaml actions

actions/checkout v3 composite

pyproject.toml pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science