snac

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

https://github.com/hubertsiuzdak/snac

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.1%) to scientific vocabulary

Keywords

audio audio-codec deep-learning
Last synced: 6 months ago · JSON representation ·

Repository

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Basic Info
Statistics
  • Stars: 650
  • Watchers: 7
  • Forks: 34
  • Open Issues: 18
  • Releases: 4
Topics
audio audio-codec deep-learning
Created about 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

SNAC 🍿

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate. For more information, read the paper: https://arxiv.org/abs/2410.14411

| 🎸 Music samples | 🗣️ Speech samples | |----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| |

🎧 More audio samples available at https://hubertsiuzdak.github.io/snac/

Overview

SNAC encodes audio into hierarchical tokens similarly to SoundStream, EnCodec, and DAC (see the image on the left). However, SNAC introduces a simple change where coarse tokens are sampled less frequently, covering a broader time span (see the image on the right).

This can not only save on bitrate, but more importantly this might be very useful for language modeling approaches to audio generation. E.g. with coarse tokens of ~10 Hz and a context window of 2048 you can effectively model a consistent structure of an audio track for ~3 minutes.

snac.png

Pretrained models

Currently, all models support only single audio channel (mono).

| Model | Bitrate | Sample Rate | Params | Recommended use case | |-----------------------------------------------------------------------------|-----------|-------------|--------|--------------------------| | hubertsiuzdak/snac_24khz | 0.98 kbps | 24 kHz | 19.8 M | 🗣️ Speech | | hubertsiuzdak/snac_32khz | 1.9 kbps | 32 kHz | 54.5 M | 🎸 Music / Sound Effects | | hubertsiuzdak/snac_44khz | 2.6 kbps | 44 kHz | 54.5 M | 🎸 Music / Sound Effects |

Usage

Install it using:

bash pip install snac

To encode (and decode) audio with SNAC in Python, use the following code:

```python import torch from snac import SNAC

model = SNAC.frompretrained("hubertsiuzdak/snac32khz").eval().cuda() audio = torch.randn(1, 1, 32000).cuda() # placeholder for actual audio with shape (B, 1, T)

with torch.inferencemode(): codes = model.encode(audio) audiohat = model.decode(codes) ```

You can also encode and reconstruct in a single call:

python with torch.inference_mode(): audio_hat, codes = model(audio)

⚠️ Note that codes is a list of token sequences of variable lengths, each corresponding to a different temporal resolution.

```

[code.shape[1] for code in codes] [12, 24, 48, 96] ```

Acknowledgements

Module definitions are adapted from the Descript Audio Codec.

Citation

If this code contributes to your research, please cite our work:

@inproceedings{siuzdak2024snac, title={SNAC: Multi-Scale Neural Audio Codec}, author={Siuzdak, Hubert and Gr{\"o}tschla, Florian and Lanzend{\"o}rfer, Luca A}, booktitle={Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation}, year={2024} }

Owner

  • Name: Hubert Siuzdak
  • Login: hubertsiuzdak
  • Kind: user

i just keep staring at tensorboard curves | deep learning && audio

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Siuzdak
    given-names: Hubert
  - family-names: Grötschla
    given-names: Florian
  - family-names: Lanzendörfer
    given-names: Luca A.
title: 'SNAC: Multi-Scale Neural Audio Codec'
url: 'https://github.com/hubertsiuzdak/snac'
preferred-citation:
  type: conference-paper
  authors:
    - family-names: Siuzdak
      given-names: Hubert
    - family-names: Grötschla
      given-names: Florian
    - family-names: Lanzendörfer
      given-names: Luca A.
  collection-title: 'Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation'
  title: 'SNAC: Multi-Scale Neural Audio Codec'
  year: 2024

GitHub Events

Total
  • Issues event: 7
  • Watch event: 247
  • Issue comment event: 9
  • Push event: 1
  • Fork event: 14
Last Year
  • Issues event: 7
  • Watch event: 247
  • Issue comment event: 9
  • Push event: 1
  • Fork event: 14

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 53,525 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 5
  • Total maintainers: 1
pypi.org: snac

Multi-Scale Neural Audio Codec

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 53,525 Last month
Rankings
Dependent packages count: 9.8%
Average: 37.3%
Dependent repos count: 64.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/pypi-release.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • pypa/gh-action-pypi-publish release/v1 composite
requirements.txt pypi
  • einops *
  • huggingface_hub *
  • numpy *
  • torch *
setup.py pypi