audio-classification-with-encodec-code

Code required to reproduce the experiments from our paper "Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space"

https://github.com/i3uex/audio-classification-with-encodec-code

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Code required to reproduce the experiments from our paper "Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space"

Basic Info
  • Host: GitHub
  • Owner: i3uex
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 41 KB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Citation

README.md

Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space

Code for the paper "Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space"

Running the Experiments

In order to reproduce the results of the paper, you need to follow the steps below:

1. Downloading datasets

All the datasets must be located in the datasets folder. This folder should contain the following subfolders after downloading the datasets:

  • GTZAN Speech_Music: Contains the GTZAN Speech Music dataset. Class folders should be named "speech" and "music".
  • GTZAN Genre: Contains the GTZAN Music Genre dataset. Class folders should be named according to the genre and be located in the "genres" folder.
  • ESC-50: Contains the ESC-50 dataset. Class folders should be named according to the class and be located in the "classes" folder. In order to parse the dataset, you should run the following script:

```bash

!/bin/bash

python3 parseESC50meta.py ```

In order to download and parse all datasets, you can run the following script:

```bash

!/bin/bash

./download_all.sh ```

Else, you can download the datasets manually:

1.1. GTZAN Speech music dataset

Can be downloaded from here or by running the following script:

```

!/bin/bash

curl -L -o gtzan-musicspeech-collection.zip https://www.kaggle.com/api/v1/datasets/download/lnicalo/gtzan-musicspeech-collection unzip gtzan-musicspeech-collection.zip -d "datasets/GTZAN Speech_Music" rm gtzan-musicspeech-collection.zip ```

Then you should rename the class folders from "speechwav" and "musicwav" to "speech" and "music" respectively. You can do this by running:

```

!/bin/bash

mv datasets/GTZAN\ SpeechMusic/speechwav datasets/GTZAN\ SpeechMusic/speech mv datasets/GTZAN\ SpeechMusic/musicwav datasets/GTZAN\ SpeechMusic/music ```

1.2. GTZAN Music genre dataset

Can be downloaded from here or by running the following script:

```

!/bin/bash

curl -L -o gtzan-dataset-music-genre-classification.zip https://www.kaggle.com/api/v1/datasets/download/andradaolteanu/gtzan-dataset-music-genre-classification unzip gtzan-dataset-music-genre-classification.zip -d "datasets/GTZAN Genre" rm gtzan-dataset-music-genre-classification.zip ```

1.3. Environmental Sound Classification (ESC-50)

Can be downloaded from here or by running the following script:

curl -L -o ESC-50.zip https://github.com/karoldvl/ESC-50/archive/master.zip unzip ESC-50.zip -d "datasets/" rm ESC-50.zip

After downloading the dataset, you should run the following script to parse the metadata:

bash python3 datasets/parse_ESC50_meta.py

2. Preparing the environment

You can create a virtual environment and install the dependencies by running the following commands:

2.1. Create a virtual environment

bash virtualenv venv source venv/bin/activate

2.2. Install the dependencies

bash pip install -r requirements.txt

3. Training the models

In order to train the models, you can run the following script:

bash python3 classify.py

Citing This Work

This repository contains the code required to reproduce the experiments from our paper, "Beyond Spectrograms: Rethinking Audio Classification from EnCodec's Latent Space". If you use this work in your research, please cite it using the following BibTeX entry:

@article{perianezpascual25, title = {Beyond Spectrograms: Rethinking Audio Classification from EnCodec's Latent Space}, author = {Perianez-Pascual, Jorge and Gutiérrez, Juan D. and Escobar-Encinas, Laura and Rubio-Largo, Álvaro and Rodriguez-Echeverria, Roberto}, doi = {10.3390/a18020108}, journal = {Algorithms}, langid = {english}, month = {February}, number = {2}, volume = {18}, year = {2025} }

Owner

  • Login: i3uex
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite both the article from preferred-citation and the software itself.
authors:
  - family-names: Perianez-Pascual
    given-names: Jorge
  - family-names: Gutiérrez
    given-names: Juan D.
  - family-names: Escobar-Encinas
    given-names: Laura
  - family-names: Rubio-Largo
    given-names: Álvaro
  - family-names: Rodriguez-Echeverria
    given-names: Roberto
title: 'Beyond Spectrograms: Rethinking Audio Classification from EnCodec''s Latent Space'
version: 1.0.0
doi: 10.3390/a18020108
date-released: '2025-02-16'
preferred-citation:
  authors:
    - family-names: Perianez-Pascual
      given-names: Jorge
    - family-names: Gutiérrez
      given-names: Juan D.
    - family-names: Escobar-Encinas
      given-names: Laura
    - family-names: Rubio-Largo
      given-names: Álvaro
    - family-names: Rodriguez-Echeverria
      given-names: Roberto
  title: 'Beyond Spectrograms: Rethinking Audio Classification from EnCodec''s Latent Space'
  doi: 10.3390/a18020108
  type: article-journal
  year: '2025'
  conference: {}
  publisher: {}

GitHub Events

Total
  • Issues event: 2
  • Watch event: 2
  • Issue comment event: 2
  • Push event: 3
  • Public event: 1
  • Pull request event: 2
  • Fork event: 2
Last Year
  • Issues event: 2
  • Watch event: 2
  • Issue comment event: 2
  • Push event: 3
  • Public event: 1
  • Pull request event: 2
  • Fork event: 2

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 1
  • Total pull requests: 2
  • Average time to close issues: less than a minute
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 2
  • Average time to close issues: less than a minute
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Timothy-John (1)
Pull Request Authors
  • Timothy-John (2)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • encodec *
  • librosa *
  • matplotlib *
  • numpy *