audioset-convnext-inf

Adapting a ConvNeXt model to audio classification on AudioSet

https://github.com/topel/audioset-convnext-inf

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.8%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Adapting a ConvNeXt model to audio classification on AudioSet

Basic Info
  • Host: GitHub
  • Owner: topel
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 1.46 MB
Statistics
  • Stars: 25
  • Watchers: 2
  • Forks: 2
  • Open Issues: 2
  • Releases: 0
Created almost 3 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

Adapting a ConvNeXt model to audio classification on AudioSet

In this work, we adapted the computer vision architecture ConvNeXt (Tiny) to perform audio tagging on AudioSet.

In this repo, we provide the PyTorch code to do inference with our best checkpoint, trained on the AudioSet dev subset (balanced + unbalanced subsets). We do not provide code to train our models, but it is heavily based on PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition, thanks a lot to Qiuqiang Kong and colleagues for their amazing open-source work.

Install instructions

conda env create -f environment.yml

The most important modules are:

  • pytorch 1.11.0 but more recent 1.* versions should work (maybe 2.* also),
  • torchaudio,
  • torchlibrosa, needed to generate log mel spectrograms just as in PANN's code.

Activate the newly created env: conda activate audio_retrieval

Then either clone this repo and work localy, or pip install it with:

pip install git+https://github.com/topel/audioset-convnext-inf@pip-install

Get a checkpoint

Create a checkpoints directory, in which a checkpoint should be added.

A checkpoint is available on Zenodo: https://zenodo.org/record/8020843

Get the convnext_tiny_471mAP.pth one to do audio tagging and embedding extraction.

The following results were obtained on the AudioSet test set:

| mAP | 0.471 | |---------|-------| | AUC | 0.973 | | d-prime | 3.071 |

A second checkpoint is also available, in case you are interested in doing experiments on the AudioCaps dataset (audio captioning and audio-text retrieval).

Audio tagging demo

The script demo_convnext.py provides an example of how to do audio tagging on a single audio file, provided in the audio_samples directory.

It will give the following output:

``` Loaded ckpt from: /gpfswork/rech/djl/uzj43um/audioretrieval/audioset-convnext-inf/checkpoints/convnexttiny_471mAP.pth

params: 28222767

Inference on:f62-S-v2swA200000210000.wav

logits size: torch.Size([1, 527]) probs size: torch.Size([1, 527]) Predicted labels using activity threshold 0.25:

[ 0 137 138 139 151 506]

Scene embedding, shape: torch.Size([1, 768])

Frame-level embeddings, shape: torch.Size([1, 768, 31, 7]) ```

You can associate the corresponding tag names to the predicted indexes thanks to the file metadata/class_labels_indices.csv:

[ 0 137 138 139 151 506] Speech; Music; Musical instrument; Plucked string instrument; Ukulele; Inside, small room

When the ground-truth is for this recording, as given in audio_samples/f62-S-v2swA_200000_210000_labels.txt:

[ 0 137 151] Speech; Music; Ukulele;

Additionally, the methods model.forward_scene_embeddings(waveform) and model.forward_frame_embeddings(waveform) provide you with audio scene and frame-level embeddings. The respective shapes are printed out in the script example: - scene embedding: a 768-d vector - frame-level embedding: 768, 31, 7. Thus, 768 "images" of size 31 time frames x 7 frequency coefficients.

Evaluate a checkpoint on the balanced and the test subsets of AudioSet

You can retrieve the results afore-mentioned with this script: evaluate_convnext_on_audioset.py

The sbatch script is provided: scripts/5_evaluate_convnext_on_audioset.sbatch

It loads the checkpoint and run it on a single GPU. It should take a few minutes to run and get the metric results in the log file.

Citation

If you find this work useful, please consider citing our paper, to be presented at INTERSPEECH 2023:

Pellegrini, T., Khalfaoui-Hassani, I., Labbé, E., & Masquelier, T. (2023). Adapting a ConvNeXt model to audio classification on AudioSet. arXiv preprint arXiv:2306.00830.

@misc{pellegrini2023adapting, title={{Adapting a ConvNeXt model to audio classification on AudioSet}}, author={Thomas Pellegrini and Ismail Khalfaoui-Hassani and Etienne Labbé and Timothée Masquelier}, year={2023}, eprint={2306.00830}, archivePrefix={arXiv}, primaryClass={cs.SD} }

Owner

  • Name: Thomas Pellegrini
  • Login: topel
  • Kind: user
  • Location: Toulouse, France
  • Company: IRIT

Citation (CITATION.cff)

# -*- coding: utf-8 -*-

cff-version: 1.2.0
message: If you use this code, please consider cite the following paper.
title: audioset-convnext-inf
authors:
  - given-names: Thomas
    family-names: Pellegrini
    affiliation: IRIT
url: https://github.com/topel/audioset-convnext-inf

preferred-citation:
  authors:
    - family-names: Pellegrini
      given-names: Thomas
      affiliation: ANITI, IRIT, UPS
      orcid: 'https://orcid.org/0000-0001-8984-1399'
    - family-names: Ismail
      given-names: Khalfaoui-Hassani
      affiliation: ANITI, UPS
    - family-names: Labbé
      given-names: Etienne
      affiliation: IRIT, UPS
      orcid: 'https://orcid.org/0000-0002-7219-5463'
    - family-names: Timothée
      given-names: Masquelier
      affiliation: CerCo
  # arxiv citation
  doi: "10.48550/arXiv.2306.00830"
  month: 6
  title: "Adapting a ConvNeXt model to audio classification on AudioSet"
  url: "https://doi.org/10.48550/arXiv.2306.00830"
  year: 2023
  type: proceedings

GitHub Events

Total
  • Issues event: 3
  • Watch event: 7
  • Issue comment event: 2
  • Push event: 1
Last Year
  • Issues event: 3
  • Watch event: 7
  • Issue comment event: 2
  • Push event: 1

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 46
  • Total Committers: 2
  • Avg Commits per committer: 23.0
  • Development Distribution Score (DDS): 0.261
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
topel t****i@g****m 34
Labbeti e****1@g****m 12

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 2
  • Total pull requests: 0
  • Average time to close issues: about 2 hours
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: about 2 hours
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • JNaranjo-Alcazar (2)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

pyproject.toml pypi
  • h5py ==3.2.1
  • matplotlib ==3.4.2
  • numpy ==1.20.1
  • scikit-learn ==0.24.2
  • scipy ==1.6.3
  • torch ==1.11.0
  • torchaudio ==0.11.0
  • torchlibrosa ==0.0.9
  • tqdm ==4.64.1
setup.py pypi
environment.yml conda
  • _libgcc_mutex 0.1
  • _openmp_mutex 4.5
  • bzip2 1.0.8
  • ca-certificates 2023.7.22
  • ld_impl_linux-64 2.40
  • libffi 3.4.2
  • libgcc-ng 13.1.0
  • libgomp 13.1.0
  • libnsl 2.0.0
  • libsqlite 3.43.0
  • libuuid 2.38.1
  • libzlib 1.2.13
  • ncurses 6.4
  • openssl 3.1.2
  • pip 23.2.1
  • python 3.9.18
  • readline 8.2
  • setuptools 68.1.2
  • tk 8.6.12
  • tzdata 2023c
  • wheel 0.41.2
  • xz 5.2.6