snac
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.1%) to scientific vocabulary
Keywords
Repository
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
Basic Info
- Host: GitHub
- Owner: hubertsiuzdak
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://hubertsiuzdak.github.io/snac/
- Size: 5.01 MB
Statistics
- Stars: 650
- Watchers: 7
- Forks: 34
- Open Issues: 18
- Releases: 4
Topics
Metadata Files
README.md
SNAC 🍿
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate. For more information, read the paper: https://arxiv.org/abs/2410.14411
| 🎸 Music samples | 🗣️ Speech samples | |----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| |
🎧 More audio samples available at https://hubertsiuzdak.github.io/snac/
Overview
SNAC encodes audio into hierarchical tokens similarly to SoundStream, EnCodec, and DAC (see the image on the left). However, SNAC introduces a simple change where coarse tokens are sampled less frequently, covering a broader time span (see the image on the right).
This can not only save on bitrate, but more importantly this might be very useful for language modeling approaches to audio generation. E.g. with coarse tokens of ~10 Hz and a context window of 2048 you can effectively model a consistent structure of an audio track for ~3 minutes.

Pretrained models
Currently, all models support only single audio channel (mono).
| Model | Bitrate | Sample Rate | Params | Recommended use case | |-----------------------------------------------------------------------------|-----------|-------------|--------|--------------------------| | hubertsiuzdak/snac_24khz | 0.98 kbps | 24 kHz | 19.8 M | 🗣️ Speech | | hubertsiuzdak/snac_32khz | 1.9 kbps | 32 kHz | 54.5 M | 🎸 Music / Sound Effects | | hubertsiuzdak/snac_44khz | 2.6 kbps | 44 kHz | 54.5 M | 🎸 Music / Sound Effects |
Usage
Install it using:
bash
pip install snac
To encode (and decode) audio with SNAC in Python, use the following code:
```python import torch from snac import SNAC
model = SNAC.frompretrained("hubertsiuzdak/snac32khz").eval().cuda() audio = torch.randn(1, 1, 32000).cuda() # placeholder for actual audio with shape (B, 1, T)
with torch.inferencemode(): codes = model.encode(audio) audiohat = model.decode(codes) ```
You can also encode and reconstruct in a single call:
python
with torch.inference_mode():
audio_hat, codes = model(audio)
⚠️ Note that codes is a list of token sequences of variable lengths, each corresponding to a different temporal
resolution.
```
[code.shape[1] for code in codes] [12, 24, 48, 96] ```
Acknowledgements
Module definitions are adapted from the Descript Audio Codec.
Citation
If this code contributes to your research, please cite our work:
@inproceedings{siuzdak2024snac,
title={SNAC: Multi-Scale Neural Audio Codec},
author={Siuzdak, Hubert and Gr{\"o}tschla, Florian and Lanzend{\"o}rfer, Luca A},
booktitle={Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation},
year={2024}
}
Owner
- Name: Hubert Siuzdak
- Login: hubertsiuzdak
- Kind: user
- Twitter: HubertSiuzdak
- Repositories: 1
- Profile: https://github.com/hubertsiuzdak
i just keep staring at tensorboard curves | deep learning && audio
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Siuzdak
given-names: Hubert
- family-names: Grötschla
given-names: Florian
- family-names: Lanzendörfer
given-names: Luca A.
title: 'SNAC: Multi-Scale Neural Audio Codec'
url: 'https://github.com/hubertsiuzdak/snac'
preferred-citation:
type: conference-paper
authors:
- family-names: Siuzdak
given-names: Hubert
- family-names: Grötschla
given-names: Florian
- family-names: Lanzendörfer
given-names: Luca A.
collection-title: 'Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation'
title: 'SNAC: Multi-Scale Neural Audio Codec'
year: 2024
GitHub Events
Total
- Issues event: 7
- Watch event: 247
- Issue comment event: 9
- Push event: 1
- Fork event: 14
Last Year
- Issues event: 7
- Watch event: 247
- Issue comment event: 9
- Push event: 1
- Fork event: 14
Packages
- Total packages: 1
-
Total downloads:
- pypi 53,525 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 5
- Total maintainers: 1
pypi.org: snac
Multi-Scale Neural Audio Codec
- Homepage: https://github.com/hubertsiuzdak/snac
- Documentation: https://snac.readthedocs.io/
- License: mit
-
Latest release: 1.2.1
published over 1 year ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v4 composite
- actions/setup-python v5 composite
- pypa/gh-action-pypi-publish release/v1 composite
- einops *
- huggingface_hub *
- numpy *
- torch *