https://github.com/google-deepmind/slowfast_nfnets

https://github.com/google-deepmind/slowfast_nfnets

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, ieee.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.5%) to scientific vocabulary
Last synced: 5 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: google-deepmind
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 13.7 KB
Statistics
  • Stars: 30
  • Watchers: 3
  • Forks: 1
  • Open Issues: 1
  • Releases: 0
Archived
Created almost 4 years ago · Last pushed over 3 years ago
Metadata Files
Readme Contributing License

README.md

Towards Learning Universal Audio Representations

In Towards Learning Universal Audio Representations, we introduce a Holistic Audio Representation Evaluation Suite (HARES), containing 12 downstream tasks spanning the speech, music, and environmental sound domains, with the hope that this will spur research on developing better models for universal audio representations. Together with the benchmark, we also propose a new Slowfast NFNet architecture in the paper.

HARES tasks

Below is a summary of all 12 HARES tasks, with the links to obtaining these freely available datasets. Note that the lables of original test sets of Birdsong and TUT18 are not publicly availabe - therefore we use the splits created by the authors of Pre-Training Audio Representations with Self-Supervision, which is based on the original training subset. For more details about how to assemble these tasks, please refer to Appendix A of the arXiv version of our paper.

| Dataset | Task | #Samples | #Classes | Domain | |----------|:-------------|------:|------:|:------| | AudioSet | audio tagging | 1.9m | 527 | environment | | Birdsong | animal sound | 36k | 2 | environment | | TUT18 | acoustic scenes | 8.6k | 10 | environment | | ESC-50 | acoustic scenes | 2.0k | 50 | environment | | Speech Commands v1 | keyword | 90k | 12 | speech | | Speech Commands v2 | keyword | 96k | 35 | speech | | Fluent Speech Commands | intention | 27k | 31 | speech | | VoxForge | languge id | 145k | 6 | speech | | VoxCeleb | speaker id | 147k | 1251 | speech | | NSynth-instrument | instrument id | 293k | 11 | music | | NSynth-pitch | pitch estimation | 293k | 128 | music | | MagnaTagATune | music tagging | 26k | 50 | music |

Audio Slowfast NFNets, a JAX implementation

We provide a JAX/Haiku implementation of the Slowfast NfNet-F0. This convolutional neural network combines Slowfast networks' ability to model both transient and long-range signals in audio, and NFNets' strong performance optimized for hardware accelerators. It achieves the state-of-the-art score on the HARES benchmark.

You may use our unit tests to test your development environment and to know more about the usage of the models, which can be executed using pytest:

bash $ pip install -r requirements.txt $ python -m pytest [-n <NUMCPUS>] slowfast_nfnets

Usage

The unit tests provided together with the model shows a few use cases of how the model can be run.

Citing this work

BibTex for citing the paper:

bibtex @inproceedings{wang2022towards, title={Towards Learning Universal Audio Representations}, author={Wang, Luyu and Luc, Pauline and Wu, Yan and Recasens, Adria and Smaira, Lucas and Brock, Andrew and Jaegle, Andrew and Alayrac, Jean-Baptiste and Dieleman, Sander and Carreira, Joao and van den Oord, Aaron}, booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages={4593--4597}, year={2022}, organization={IEEE} }

Disclaimer

This is not an official Google product.

Owner

  • Name: Google DeepMind
  • Login: google-deepmind
  • Kind: organization

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • chex >=0.0.6
  • dm-haiku *
  • numpy *
  • pytest-xdist *