https://github.com/google-deepmind/slowfast_nfnets
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org, ieee.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: google-deepmind
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 13.7 KB
Statistics
- Stars: 30
- Watchers: 3
- Forks: 1
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
Towards Learning Universal Audio Representations
In Towards Learning Universal Audio Representations, we introduce a Holistic Audio Representation Evaluation Suite (HARES), containing 12 downstream tasks spanning the speech, music, and environmental sound domains, with the hope that this will spur research on developing better models for universal audio representations. Together with the benchmark, we also propose a new Slowfast NFNet architecture in the paper.
HARES tasks
Below is a summary of all 12 HARES tasks, with the links to obtaining these freely available datasets. Note that the lables of original test sets of Birdsong and TUT18 are not publicly availabe - therefore we use the splits created by the authors of Pre-Training Audio Representations with Self-Supervision, which is based on the original training subset. For more details about how to assemble these tasks, please refer to Appendix A of the arXiv version of our paper.
| Dataset | Task | #Samples | #Classes | Domain | |----------|:-------------|------:|------:|:------| | AudioSet | audio tagging | 1.9m | 527 | environment | | Birdsong | animal sound | 36k | 2 | environment | | TUT18 | acoustic scenes | 8.6k | 10 | environment | | ESC-50 | acoustic scenes | 2.0k | 50 | environment | | Speech Commands v1 | keyword | 90k | 12 | speech | | Speech Commands v2 | keyword | 96k | 35 | speech | | Fluent Speech Commands | intention | 27k | 31 | speech | | VoxForge | languge id | 145k | 6 | speech | | VoxCeleb | speaker id | 147k | 1251 | speech | | NSynth-instrument | instrument id | 293k | 11 | music | | NSynth-pitch | pitch estimation | 293k | 128 | music | | MagnaTagATune | music tagging | 26k | 50 | music |
Audio Slowfast NFNets, a JAX implementation
We provide a JAX/Haiku implementation of the Slowfast NfNet-F0. This convolutional neural network combines Slowfast networks' ability to model both transient and long-range signals in audio, and NFNets' strong performance optimized for hardware accelerators. It achieves the state-of-the-art score on the HARES benchmark.
You may use our unit tests to test your development environment and to know more
about the usage of the models, which can be executed using pytest:
bash
$ pip install -r requirements.txt
$ python -m pytest [-n <NUMCPUS>] slowfast_nfnets
Usage
The unit tests provided together with the model shows a few use cases of how the model can be run.
Citing this work
BibTex for citing the paper:
bibtex
@inproceedings{wang2022towards,
title={Towards Learning Universal Audio Representations},
author={Wang, Luyu and Luc, Pauline and Wu, Yan and Recasens, Adria and Smaira, Lucas and Brock, Andrew and Jaegle, Andrew and Alayrac, Jean-Baptiste and Dieleman, Sander and Carreira, Joao and van den Oord, Aaron},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={4593--4597},
year={2022},
organization={IEEE}
}
Disclaimer
This is not an official Google product.
Owner
- Name: Google DeepMind
- Login: google-deepmind
- Kind: organization
- Website: https://www.deepmind.com/
- Repositories: 245
- Profile: https://github.com/google-deepmind
GitHub Events
Total
Last Year
Dependencies
- chex >=0.0.6
- dm-haiku *
- numpy *
- pytest-xdist *