audio-classification-with-encodec-code
Code required to reproduce the experiments from our paper "Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space"
https://github.com/i3uex/audio-classification-with-encodec-code
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.0%) to scientific vocabulary
Repository
Code required to reproduce the experiments from our paper "Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space"
Statistics
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space
Code for the paper "Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space"
Running the Experiments
In order to reproduce the results of the paper, you need to follow the steps below:
1. Downloading datasets
All the datasets must be located in the datasets folder. This folder should contain the following subfolders after downloading the datasets:
- GTZAN Speech_Music: Contains the GTZAN Speech Music dataset. Class folders should be named "speech" and "music".
- GTZAN Genre: Contains the GTZAN Music Genre dataset. Class folders should be named according to the genre and be located in the "genres" folder.
- ESC-50: Contains the ESC-50 dataset. Class folders should be named according to the class and be located in the "classes" folder. In order to parse the dataset, you should run the following script:
```bash
!/bin/bash
python3 parseESC50meta.py ```
In order to download and parse all datasets, you can run the following script:
```bash
!/bin/bash
./download_all.sh ```
Else, you can download the datasets manually:
1.1. GTZAN Speech music dataset
Can be downloaded from here or by running the following script:
```
!/bin/bash
curl -L -o gtzan-musicspeech-collection.zip https://www.kaggle.com/api/v1/datasets/download/lnicalo/gtzan-musicspeech-collection unzip gtzan-musicspeech-collection.zip -d "datasets/GTZAN Speech_Music" rm gtzan-musicspeech-collection.zip ```
Then you should rename the class folders from "speechwav" and "musicwav" to "speech" and "music" respectively. You can do this by running:
```
!/bin/bash
mv datasets/GTZAN\ SpeechMusic/speechwav datasets/GTZAN\ SpeechMusic/speech mv datasets/GTZAN\ SpeechMusic/musicwav datasets/GTZAN\ SpeechMusic/music ```
1.2. GTZAN Music genre dataset
Can be downloaded from here or by running the following script:
```
!/bin/bash
curl -L -o gtzan-dataset-music-genre-classification.zip https://www.kaggle.com/api/v1/datasets/download/andradaolteanu/gtzan-dataset-music-genre-classification unzip gtzan-dataset-music-genre-classification.zip -d "datasets/GTZAN Genre" rm gtzan-dataset-music-genre-classification.zip ```
1.3. Environmental Sound Classification (ESC-50)
Can be downloaded from here or by running the following script:
curl -L -o ESC-50.zip https://github.com/karoldvl/ESC-50/archive/master.zip
unzip ESC-50.zip -d "datasets/"
rm ESC-50.zip
After downloading the dataset, you should run the following script to parse the metadata:
bash
python3 datasets/parse_ESC50_meta.py
2. Preparing the environment
You can create a virtual environment and install the dependencies by running the following commands:
2.1. Create a virtual environment
bash
virtualenv venv
source venv/bin/activate
2.2. Install the dependencies
bash
pip install -r requirements.txt
3. Training the models
In order to train the models, you can run the following script:
bash
python3 classify.py
Citing This Work
This repository contains the code required to reproduce the experiments from our paper, "Beyond Spectrograms: Rethinking Audio Classification from EnCodec's Latent Space". If you use this work in your research, please cite it using the following BibTeX entry:
@article{perianezpascual25,
title = {Beyond Spectrograms: Rethinking Audio Classification from EnCodec's Latent Space},
author = {Perianez-Pascual, Jorge and Gutiérrez, Juan D. and Escobar-Encinas, Laura and Rubio-Largo, Álvaro and Rodriguez-Echeverria, Roberto},
doi = {10.3390/a18020108},
journal = {Algorithms},
langid = {english},
month = {February},
number = {2},
volume = {18},
year = {2025}
}
Owner
- Login: i3uex
- Kind: user
- Repositories: 5
- Profile: https://github.com/i3uex
Citation (CITATION.cff)
cff-version: 1.2.0
message: If you use this software, please cite both the article from preferred-citation and the software itself.
authors:
- family-names: Perianez-Pascual
given-names: Jorge
- family-names: Gutiérrez
given-names: Juan D.
- family-names: Escobar-Encinas
given-names: Laura
- family-names: Rubio-Largo
given-names: Álvaro
- family-names: Rodriguez-Echeverria
given-names: Roberto
title: 'Beyond Spectrograms: Rethinking Audio Classification from EnCodec''s Latent Space'
version: 1.0.0
doi: 10.3390/a18020108
date-released: '2025-02-16'
preferred-citation:
authors:
- family-names: Perianez-Pascual
given-names: Jorge
- family-names: Gutiérrez
given-names: Juan D.
- family-names: Escobar-Encinas
given-names: Laura
- family-names: Rubio-Largo
given-names: Álvaro
- family-names: Rodriguez-Echeverria
given-names: Roberto
title: 'Beyond Spectrograms: Rethinking Audio Classification from EnCodec''s Latent Space'
doi: 10.3390/a18020108
type: article-journal
year: '2025'
conference: {}
publisher: {}
GitHub Events
Total
- Issues event: 2
- Watch event: 2
- Issue comment event: 2
- Push event: 3
- Public event: 1
- Pull request event: 2
- Fork event: 2
Last Year
- Issues event: 2
- Watch event: 2
- Issue comment event: 2
- Push event: 3
- Public event: 1
- Pull request event: 2
- Fork event: 2
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 1
- Total pull requests: 2
- Average time to close issues: less than a minute
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 2
- Average time to close issues: less than a minute
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Timothy-John (1)
Pull Request Authors
- Timothy-John (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- encodec *
- librosa *
- matplotlib *
- numpy *