audio-classification-with-encodec-code

Code required to reproduce the experiments from our paper "Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space"

https://github.com/i3uex/audio-classification-with-encodec-code

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (6.0%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Code required to reproduce the experiments from our paper "Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space"

Basic Info

Host: GitHub
Owner: i3uex
Language: Python
Default Branch: main
Homepage:
Size: 41 KB

Statistics

Stars: 1
Watchers: 2
Forks: 1
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme Citation

Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space

Code for the paper "Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space"

Running the Experiments

In order to reproduce the results of the paper, you need to follow the steps below:

1. Downloading datasets

All the datasets must be located in the datasets folder. This folder should contain the following subfolders after downloading the datasets:

GTZAN Speech_Music: Contains the GTZAN Speech Music dataset. Class folders should be named "speech" and "music".
GTZAN Genre: Contains the GTZAN Music Genre dataset. Class folders should be named according to the genre and be located in the "genres" folder.
ESC-50: Contains the ESC-50 dataset. Class folders should be named according to the class and be located in the "classes" folder. In order to parse the dataset, you should run the following script:

```bash

!/bin/bash

python3 parseESC50meta.py ```

In order to download and parse all datasets, you can run the following script:

```bash

!/bin/bash

./download_all.sh ```

Else, you can download the datasets manually:

1.1. GTZAN Speech music dataset

Can be downloaded from here or by running the following script:

```

!/bin/bash

curl -L -o gtzan-musicspeech-collection.zip https://www.kaggle.com/api/v1/datasets/download/lnicalo/gtzan-musicspeech-collection unzip gtzan-musicspeech-collection.zip -d "datasets/GTZAN Speech_Music" rm gtzan-musicspeech-collection.zip ```

Then you should rename the class folders from "speechwav" and "musicwav" to "speech" and "music" respectively. You can do this by running:

```

!/bin/bash

mv datasets/GTZAN\ SpeechMusic/speechwav datasets/GTZAN\ SpeechMusic/speech mv datasets/GTZAN\ SpeechMusic/musicwav datasets/GTZAN\ SpeechMusic/music ```

1.2. GTZAN Music genre dataset

Can be downloaded from here or by running the following script:

```

!/bin/bash

curl -L -o gtzan-dataset-music-genre-classification.zip https://www.kaggle.com/api/v1/datasets/download/andradaolteanu/gtzan-dataset-music-genre-classification unzip gtzan-dataset-music-genre-classification.zip -d "datasets/GTZAN Genre" rm gtzan-dataset-music-genre-classification.zip ```

1.3. Environmental Sound Classification (ESC-50)

Can be downloaded from here or by running the following script:

curl -L -o ESC-50.zip https://github.com/karoldvl/ESC-50/archive/master.zip unzip ESC-50.zip -d "datasets/" rm ESC-50.zip

After downloading the dataset, you should run the following script to parse the metadata:

bash python3 datasets/parse_ESC50_meta.py

2. Preparing the environment

You can create a virtual environment and install the dependencies by running the following commands:

2.1. Create a virtual environment

bash virtualenv venv source venv/bin/activate

2.2. Install the dependencies

bash pip install -r requirements.txt

3. Training the models

In order to train the models, you can run the following script:

bash python3 classify.py

Citing This Work

This repository contains the code required to reproduce the experiments from our paper, "Beyond Spectrograms: Rethinking Audio Classification from EnCodec's Latent Space". If you use this work in your research, please cite it using the following BibTeX entry:

@article{perianezpascual25, title = {Beyond Spectrograms: Rethinking Audio Classification from EnCodec's Latent Space}, author = {Perianez-Pascual, Jorge and Gutiérrez, Juan D. and Escobar-Encinas, Laura and Rubio-Largo, Álvaro and Rodriguez-Echeverria, Roberto}, doi = {10.3390/a18020108}, journal = {Algorithms}, langid = {english}, month = {February}, number = {2}, volume = {18}, year = {2025} }

Owner

Login: i3uex
Kind: user

Repositories: 5
Profile: https://github.com/i3uex

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite both the article from preferred-citation and the software itself.
authors:
  - family-names: Perianez-Pascual
    given-names: Jorge
  - family-names: Gutiérrez
    given-names: Juan D.
  - family-names: Escobar-Encinas
    given-names: Laura
  - family-names: Rubio-Largo
    given-names: Álvaro
  - family-names: Rodriguez-Echeverria
    given-names: Roberto
title: 'Beyond Spectrograms: Rethinking Audio Classification from EnCodec''s Latent Space'
version: 1.0.0
doi: 10.3390/a18020108
date-released: '2025-02-16'
preferred-citation:
  authors:
    - family-names: Perianez-Pascual
      given-names: Jorge
    - family-names: Gutiérrez
      given-names: Juan D.
    - family-names: Escobar-Encinas
      given-names: Laura
    - family-names: Rubio-Largo
      given-names: Álvaro
    - family-names: Rodriguez-Echeverria
      given-names: Roberto
  title: 'Beyond Spectrograms: Rethinking Audio Classification from EnCodec''s Latent Space'
  doi: 10.3390/a18020108
  type: article-journal
  year: '2025'
  conference: {}
  publisher: {}

GitHub Events

Total

Issues event: 2
Watch event: 2
Issue comment event: 2
Push event: 3
Public event: 1
Pull request event: 2
Fork event: 2

Last Year

Issues event: 2
Watch event: 2
Issue comment event: 2
Push event: 3
Public event: 1
Pull request event: 2
Fork event: 2

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 1
Total pull requests: 2
Average time to close issues: less than a minute
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 2
Average time to close issues: less than a minute
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Timothy-John (1)

Pull Request Authors

Timothy-John (2)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

requirements.txt pypi

encodec *
librosa *
matplotlib *
numpy *

audio-classification-with-encodec-code

Science Score: 57.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space

Running the Experiments

1. Downloading datasets

!/bin/bash

!/bin/bash

1.1. GTZAN Speech music dataset

!/bin/bash

!/bin/bash

1.2. GTZAN Music genre dataset

!/bin/bash

1.3. Environmental Sound Classification (ESC-50)

2. Preparing the environment

2.1. Create a virtual environment

2.2. Install the dependencies

3. Training the models

Citing This Work

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies