audioset-convnext-inf

Adapting a ConvNeXt model to audio classification on AudioSet

https://github.com/topel/audioset-convnext-inf

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, zenodo.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Adapting a ConvNeXt model to audio classification on AudioSet

Basic Info

Host: GitHub
Owner: topel
License: mit
Language: Python
Default Branch: main
Size: 1.46 MB

Statistics

Stars: 25
Watchers: 2
Forks: 2
Open Issues: 2
Releases: 0

Created about 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

Adapting a ConvNeXt model to audio classification on AudioSet

In this work, we adapted the computer vision architecture ConvNeXt (Tiny) to perform audio tagging on AudioSet.

In this repo, we provide the PyTorch code to do inference with our best checkpoint, trained on the AudioSet dev subset (balanced + unbalanced subsets). We do not provide code to train our models, but it is heavily based on PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition, thanks a lot to Qiuqiang Kong and colleagues for their amazing open-source work.

Install instructions

conda env create -f environment.yml

The most important modules are:

pytorch 1.11.0 but more recent 1.* versions should work (maybe 2.* also),
torchaudio,
torchlibrosa, needed to generate log mel spectrograms just as in PANN's code.

Activate the newly created env: conda activate audio_retrieval

Then either clone this repo and work localy, or pip install it with:

pip install git+https://github.com/topel/audioset-convnext-inf@pip-install

Get a checkpoint

Create a checkpoints directory, in which a checkpoint should be added.

A checkpoint is available on Zenodo: https://zenodo.org/record/8020843

Get the convnext_tiny_471mAP.pth one to do audio tagging and embedding extraction.

The following results were obtained on the AudioSet test set:

| mAP | 0.471 | |---------|-------| | AUC | 0.973 | | d-prime | 3.071 |

A second checkpoint is also available, in case you are interested in doing experiments on the AudioCaps dataset (audio captioning and audio-text retrieval).

Audio tagging demo

The script demo_convnext.py provides an example of how to do audio tagging on a single audio file, provided in the audio_samples directory.

It will give the following output:

``` Loaded ckpt from: /gpfswork/rech/djl/uzj43um/audioretrieval/audioset-convnext-inf/checkpoints/convnexttiny_471mAP.pth

params: 28222767

Inference on:f62-S-v2swA200000210000.wav

logits size: torch.Size([1, 527]) probs size: torch.Size([1, 527]) Predicted labels using activity threshold 0.25:

[ 0 137 138 139 151 506]

Scene embedding, shape: torch.Size([1, 768])

Frame-level embeddings, shape: torch.Size([1, 768, 31, 7]) ```

You can associate the corresponding tag names to the predicted indexes thanks to the file metadata/class_labels_indices.csv:

[ 0 137 138 139 151 506] Speech; Music; Musical instrument; Plucked string instrument; Ukulele; Inside, small room

When the ground-truth is for this recording, as given in audio_samples/f62-S-v2swA_200000_210000_labels.txt:

[ 0 137 151] Speech; Music; Ukulele;

Additionally, the methods model.forward_scene_embeddings(waveform) and model.forward_frame_embeddings(waveform) provide you with audio scene and frame-level embeddings. The respective shapes are printed out in the script example: - scene embedding: a 768-d vector - frame-level embedding: 768, 31, 7. Thus, 768 "images" of size 31 time frames x 7 frequency coefficients.

Evaluate a checkpoint on the balanced and the test subsets of AudioSet

You can retrieve the results afore-mentioned with this script: evaluate_convnext_on_audioset.py

The sbatch script is provided: scripts/5_evaluate_convnext_on_audioset.sbatch

It loads the checkpoint and run it on a single GPU. It should take a few minutes to run and get the metric results in the log file.

Citation

If you find this work useful, please consider citing our paper, to be presented at INTERSPEECH 2023:

Pellegrini, T., Khalfaoui-Hassani, I., Labbé, E., & Masquelier, T. (2023). Adapting a ConvNeXt model to audio classification on AudioSet. arXiv preprint arXiv:2306.00830.

@misc{pellegrini2023adapting, title={{Adapting a ConvNeXt model to audio classification on AudioSet}}, author={Thomas Pellegrini and Ismail Khalfaoui-Hassani and Etienne Labbé and Timothée Masquelier}, year={2023}, eprint={2306.00830}, archivePrefix={arXiv}, primaryClass={cs.SD} }

Owner

Name: Thomas Pellegrini
Login: topel
Kind: user
Location: Toulouse, France
Company: IRIT

Website: http://www.irit.fr/~Thomas.Pellegrini/
Repositories: 38
Profile: https://github.com/topel

Citation (CITATION.cff)

# -*- coding: utf-8 -*-

cff-version: 1.2.0
message: If you use this code, please consider cite the following paper.
title: audioset-convnext-inf
authors:
  - given-names: Thomas
    family-names: Pellegrini
    affiliation: IRIT
url: https://github.com/topel/audioset-convnext-inf

preferred-citation:
  authors:
    - family-names: Pellegrini
      given-names: Thomas
      affiliation: ANITI, IRIT, UPS
      orcid: 'https://orcid.org/0000-0001-8984-1399'
    - family-names: Ismail
      given-names: Khalfaoui-Hassani
      affiliation: ANITI, UPS
    - family-names: Labbé
      given-names: Etienne
      affiliation: IRIT, UPS
      orcid: 'https://orcid.org/0000-0002-7219-5463'
    - family-names: Timothée
      given-names: Masquelier
      affiliation: CerCo
  # arxiv citation
  doi: "10.48550/arXiv.2306.00830"
  month: 6
  title: "Adapting a ConvNeXt model to audio classification on AudioSet"
  url: "https://doi.org/10.48550/arXiv.2306.00830"
  year: 2023
  type: proceedings

GitHub Events

Total

Issues event: 3
Watch event: 7
Issue comment event: 2
Push event: 1

Last Year

Issues event: 3
Watch event: 7
Issue comment event: 2
Push event: 1

Committers

Last synced: 11 months ago

All Time

Total Commits: 46
Total Committers: 2
Avg Commits per committer: 23.0
Development Distribution Score (DDS): 0.261

Past Year

Commits: 1
Committers: 1
Avg Commits per committer: 1.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
topel	t**i@g**m	34
Labbeti	e**1@g**m	12

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 2
Total pull requests: 0
Average time to close issues: about 2 hours
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 1.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 0
Average time to close issues: about 2 hours
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 1.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

JNaranjo-Alcazar (2)

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

pyproject.toml pypi

h5py ==3.2.1
matplotlib ==3.4.2
numpy ==1.20.1
scikit-learn ==0.24.2
scipy ==1.6.3
torch ==1.11.0
torchaudio ==0.11.0
torchlibrosa ==0.0.9
tqdm ==4.64.1

setup.py pypi

environment.yml conda

_libgcc_mutex 0.1
_openmp_mutex 4.5
bzip2 1.0.8
ca-certificates 2023.7.22
ld_impl_linux-64 2.40
libffi 3.4.2
libgcc-ng 13.1.0
libgomp 13.1.0
libnsl 2.0.0
libsqlite 3.43.0
libuuid 2.38.1
libzlib 1.2.13
ncurses 6.4
openssl 3.1.2
pip 23.2.1
python 3.9.18
readline 8.2
setuptools 68.1.2
tk 8.6.12
tzdata 2023c
wheel 0.41.2
xz 5.2.6

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

audioset-convnext-inf

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Adapting a ConvNeXt model to audio classification on AudioSet

Install instructions

Get a checkpoint

Audio tagging demo

params: 28222767

Evaluate a checkpoint on the balanced and the test subsets of AudioSet

Citation

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies