https://github.com/bagustris/serab

SERAB: a multi-lingual benchmark for speech emotion recognition

https://github.com/bagustris/serab

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

SERAB: a multi-lingual benchmark for speech emotion recognition

Basic Info
  • Host: GitHub
  • Owner: bagustris
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 93.1 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of Neclow/SERAB
Created over 3 years ago · Last pushed over 3 years ago

https://github.com/bagustris/SERAB/blob/main/

# SERAB: Speech Emotion Recognition Adaptation Benchmark

This repo contains a "simplified" implementation of [SERAB](https://arxiv.org/abs/2110.03414), which includes:
* BYOL-A training and utility functions (Original repo: https://github.com/nttcslab/byol-a)
* BYOL-A and transformer-inspired models
    * Kudos to Phil Wang for his implementation of CvT (https://github.com/lucidrains/vit-pytorch)
* Benchmark tests for SERAB
* TFDS scripts to load SERAB data

Update: BYOL-S was one of the strongest submissions of the HEAR NeurIPS 2021 Challenge! Leaderboard results: https://neuralaudio.ai/hear2021-results.html

## Demo
* A quick demo detailing the SERAB evaluation procedure on a Colab notebook is available [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1EiHujFVt9Hb0VbI0b5RaMfYOYaHq9NrQ?usp=sharing)

## Environment setup
Libraries to reproduce the environment are detailed in `serab.yml`.

To reproduce the environment, run:

```console
conda env create -f serab.yml
```

To install the external source files from patches, copy the following after cloning the repo:
```console
cd SERAB/
curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/config.yaml
patch --ignore-whitespace < config.diff
curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/train.py
patch < train.diff
cd byol_a/
curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/byol_a/augmentations.py
patch < augmentations.diff
curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/byol_a/common.py
patch < common.diff
curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/byol_a/dataset.py
patch < dataset.diff
curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/byol_a/models.py
mv models.py models/audio_ntt.py
```

## Evaluate a (pre-trained model) using SERAB
In this simplified version, only PyTorch models can be used.

Before running the evaluation, make sure that the config file `config.yaml` is correctly setup for your model.

To run a pre-existing model, run:
```console
python clf_benchmark.py --model_name {MODEL_NAME} --dataset_name {DATASET_NAME}
```

By default, grid-search-based classifier hyperparameter optimization is performed. To run a pre-existing model with the "default" classifiers, add the `model_selection --none` key:
```console
python clf_benchmark.py --model_name {MODEL_NAME} --dataset_name {DATASET_NAME} --model_selection none
```

To run a model on all the SERAB datasets, DVC can be used.

Make the appropriate modifications in `dvc.yaml` and run:
```console
dvc repro
```

## Train a model " la BYOL-A"
Models can be pre-trained on a subsample of AudioSet that only contains speech.

You might need to do changes in `train.py` and `config.yaml` before starting training.

To train a model, run:
```console
python train.py {MODEL_NAME}  # or dvc repro
```

As training time is usually long (10-20h depending on the model), we recommend using [tmux](https://github.com/tmux/tmux) to attach & detach terminals from a given session.

## SERAB datasets
While CREMA-D and SAVEE are already integrated into TFDS, the other datasets were added as tensorflow datasets.

The code to load these datasets can be found in `tensorflow_datasets`.

Here are the steps to download and load the SERAB datasets:
1. In the `tensorflow_datasets` folder, create the folders `download/manual`
2. Download the compressed datasets (.zip files) under `tensorflow_datasets/download/manual/`

Link to the SERAB Datasets:
* AESDD: http://m3c.web.auth.gr/research/aesdd-speech-emotion-recognition/
* CaFE: https://zenodo.org/record/1478765
* EmoDB: http://emodb.bilderbar.info/download/
* EMOVO: http://voice.fub.it/activities/corpora/emovo/index.html
* IEM4 (restricted access): https://sail.usc.edu/iemocap/
* RAVDESS: https://smartlaboratory.org/ravdess/
* SAVEE (restricted access): http://kahlan.eps.surrey.ac.uk/savee/Download.html
* ShEMO: https://github.com/mansourehk/ShEMO
* SUBESCO: https://zenodo.org/record/4526477#.YcyUeGjMJPY 

3. Build each dataset using the TFDS CLI:
```console
cd tensorflow_datasets/{DATASET_NAME}
tfds build  # Download and prepare the dataset to `~/tensorflow_datasets/
```

The datasets are now ready to use!

## Citation

If you are using this code, please cite [the paper](https://arxiv.org/abs/2110.03414):
```
@article{scheidwasser2021serab,
  title={SERAB: A multi-lingual benchmark for speech emotion recognition},
  author={Scheidwasser-Clow, Neil and Kegler, Mikolaj and Beckmann, Pierre and Cernak, Milos},
  journal={arXiv preprint arXiv:2110.03414},
  year={2021}
}
```

Owner

  • Name: Bagus Tris Atmaja
  • Login: bagustris
  • Kind: user
  • Location: Tsukuba
  • Company: AIST

Researcher @aistairc @VibrasticLab

GitHub Events

Total
Last Year