snac

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

https://github.com/hubertsiuzdak/snac

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.1%) to scientific vocabulary

Keywords

audio audio-codec deep-learning

Last synced: 6 months ago · JSON representation ·

Repository

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Basic Info

Host: GitHub
Owner: hubertsiuzdak
License: mit
Language: Python
Default Branch: main
Homepage: https://hubertsiuzdak.github.io/snac/
Size: 5.01 MB

Statistics

Stars: 650
Watchers: 7
Forks: 34
Open Issues: 18
Releases: 4

Topics

audio audio-codec deep-learning

Created about 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

SNAC 🍿

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate. For more information, read the paper: https://arxiv.org/abs/2410.14411

| 🎸 Music samples | 🗣️ Speech samples | |----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| |

🎧 More audio samples available at https://hubertsiuzdak.github.io/snac/

Overview

SNAC encodes audio into hierarchical tokens similarly to SoundStream, EnCodec, and DAC (see the image on the left). However, SNAC introduces a simple change where coarse tokens are sampled less frequently, covering a broader time span (see the image on the right).

This can not only save on bitrate, but more importantly this might be very useful for language modeling approaches to audio generation. E.g. with coarse tokens of ~10 Hz and a context window of 2048 you can effectively model a consistent structure of an audio track for ~3 minutes.

Pretrained models

Currently, all models support only single audio channel (mono).

| Model | Bitrate | Sample Rate | Params | Recommended use case | |-----------------------------------------------------------------------------|-----------|-------------|--------|--------------------------| | hubertsiuzdak/snac_24khz | 0.98 kbps | 24 kHz | 19.8 M | 🗣️ Speech | | hubertsiuzdak/snac_32khz | 1.9 kbps | 32 kHz | 54.5 M | 🎸 Music / Sound Effects | | hubertsiuzdak/snac_44khz | 2.6 kbps | 44 kHz | 54.5 M | 🎸 Music / Sound Effects |

Usage

Install it using:

bash pip install snac

To encode (and decode) audio with SNAC in Python, use the following code:

```python import torch from snac import SNAC

model = SNAC.frompretrained("hubertsiuzdak/snac32khz").eval().cuda() audio = torch.randn(1, 1, 32000).cuda() # placeholder for actual audio with shape (B, 1, T)

with torch.inferencemode(): codes = model.encode(audio) audiohat = model.decode(codes) ```

You can also encode and reconstruct in a single call:

python with torch.inference_mode(): audio_hat, codes = model(audio)

⚠️ Note that codes is a list of token sequences of variable lengths, each corresponding to a different temporal resolution.

```

[code.shape[1] for code in codes] [12, 24, 48, 96] ```

Acknowledgements

Module definitions are adapted from the Descript Audio Codec.

Citation

If this code contributes to your research, please cite our work:

@inproceedings{siuzdak2024snac, title={SNAC: Multi-Scale Neural Audio Codec}, author={Siuzdak, Hubert and Gr{\"o}tschla, Florian and Lanzend{\"o}rfer, Luca A}, booktitle={Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation}, year={2024} }

Owner

Name: Hubert Siuzdak
Login: hubertsiuzdak
Kind: user

Twitter: HubertSiuzdak
Repositories: 1
Profile: https://github.com/hubertsiuzdak

i just keep staring at tensorboard curves | deep learning && audio

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Siuzdak
    given-names: Hubert
  - family-names: Grötschla
    given-names: Florian
  - family-names: Lanzendörfer
    given-names: Luca A.
title: 'SNAC: Multi-Scale Neural Audio Codec'
url: 'https://github.com/hubertsiuzdak/snac'
preferred-citation:
  type: conference-paper
  authors:
    - family-names: Siuzdak
      given-names: Hubert
    - family-names: Grötschla
      given-names: Florian
    - family-names: Lanzendörfer
      given-names: Luca A.
  collection-title: 'Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation'
  title: 'SNAC: Multi-Scale Neural Audio Codec'
  year: 2024

GitHub Events

Total

Issues event: 7
Watch event: 247
Issue comment event: 9
Push event: 1
Fork event: 14

Last Year

Issues event: 7
Watch event: 247
Issue comment event: 9
Push event: 1
Fork event: 14

Packages

Total packages: 1
Total downloads:
- pypi 53,525 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 5
Total maintainers: 1

pypi.org: snac

Multi-Scale Neural Audio Codec

Homepage: https://github.com/hubertsiuzdak/snac
Documentation: https://snac.readthedocs.io/
License: mit
Latest release: 1.2.1
published over 1 year ago

Versions: 5
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 53,525 Last month

Rankings

Dependent packages count: 9.8%

Average: 37.3%

Dependent repos count: 64.8%

Maintainers (1)

hubert.siuzdak

Last synced: 6 months ago

Dependencies

.github/workflows/pypi-release.yml actions

actions/checkout v4 composite
actions/setup-python v5 composite
pypa/gh-action-pypi-publish release/v1 composite

requirements.txt pypi

einops *
huggingface_hub *
numpy *
torch *

setup.py pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science