https://github.com/bagustris/msp-podcast_challenge_is2025

MSP-Podcast Challenge Baseline Code for Interspeech 2025

https://github.com/bagustris/msp-podcast_challenge_is2025

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

MSP-Podcast Challenge Baseline Code for Interspeech 2025

Basic Info
  • Host: GitHub
  • Owner: bagustris
  • Language: Python
  • Default Branch: main
  • Size: 35.2 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of msplabresearch/MSP-Podcast_Challenge_IS2025
Created over 1 year ago · Last pushed over 1 year ago

https://github.com/bagustris/MSP-Podcast_Challenge_IS2025/blob/main/

# Interspeech 2025 - Speech Emotion Recognition in Naturalistic Conditions Challenge
# Baselines Training and Evaluation

The Speech Emotion Recognition (SER) in Naturalistic Conditions Challenge at Interspeech 2025 aims to advance the field of emotion recognition from spontaneous speech, emphasizing real-world applicability over controlled, acted scenarios. Utilizing the MSP-Podcast corpus  a rich dataset of over 324 hours of naturalistic conversational speech  the challenge provides a platform for researchers to develop and benchmark SER technologies that perform effectively in complex, real-world environments. Participants have access to speaker-independent training and development sets, as well as an exclusive test set, all annotated for two distinct tasks: categorical emotion recognition and emotional attributes prediction.

This repository contains scripts to train and evaluate baseline models on various tasks including categorical emotion recognition and multi-task emotional attributes prediction.

#### Refer to the links below to sign up for the challenge and to find information about the rules, submission deadlines, file formatting instructions, and more:

Link to the challenge: [Interspeech 2025 - Speech Emotion Recognition in Naturalistic Conditions Challenge](https://lab-msp.com/MSP-Podcast_Competition/IS2025/)


Link to previous challenge paper (Odyssey 2024 - SER Challenge): [PDF](https://ecs.utdallas.edu/research/researchlabs/msp-lab/publications/Goncalves_2024.pdf)

The paper contains baseline implementation details. See citation bellow:

```
@inproceedings{Goncalves_2024,
  title     = {Odyssey 2024 - Speech Emotion Recognition Challenge: Dataset, Baseline Framework, and Results},
  author    = {Lucas Goncalves and Ali N. Salman and Abinay Reddy Naini and Laureano Moro-Velzquez and Thomas Thebaud and Paola Garcia and Najim Dehak and Berrak Sisman and Carlos Busso},
  year      = {2024},
  booktitle = {The Speaker and Language Recognition Workshop (Odyssey 2024)},
  pages     = {247--254},
  doi       = {10.21437/odyssey.2024-35},
}
```


## Environment Setup

Python version = 3.9.7

To replicate the environment necessary to run the code, you have two options:

### Using Conda

1. Ensure that you have [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) installed.
2. Create a conda environment using the `spec-file.txt` by running:
    conda create --name baseline_env --file spec-file.txt
3. Activate the environment:
    conda activate baseline_env
4. Make sure to install the transformers library as it is essential for the code to run:
    pip install transformers


### Using pip

1. Alternatively, you can use `requirements.txt` to install the necessary packages in a virtual environment:
    python -m venv myenv
    source myenv/bin/activate
    pip install -r requirements.txt
2. Make sure to install the transformers library as it is essential for the code to run:
    pip install transformers


## Configuration

Before running the training or evaluation scripts, check instructions below and update the `./configs/config_dim.json` and `./configs/config_cat.json` files with the paths to your local audio folder and label CSV file.

### Categorical Emotion Recognition Model

Before running training or evaluation of the categorical emotion recognition model, please execute the script `process_labels_for_categorical.py` to properly format the provided `labels_consensus.csv` file for categorical emotion recognition. Place the path to the processed .csv file in the `./configs/config_cat.json` file to run this configuration.

### Attributes Emotion Recognition Model

The original `labels_consensus.csv` file provided with the dataset can be used as is for attributes emotion recognition. Please place the path to the `labels_consensus.csv` file in the `./configs/config_dim.json` file to run this configuration.

## Inferencing

### HuggingFace

If you are only interested in using the pretrained models for prediction or feature extraction, we have made the models available on HuggingFace.

 #### Models on HuggingFace (Interspeech 2025)
  - [ ] Categorical model (TBA)
  - [ ] Multi-task attribute model (TBA)

  #### Previous Models on HuggingFace (Odyssey 2024)
  - [x] [Categorical model](https://huggingface.co/3loi/SER-Odyssey-Baseline-WavLM-Categorical)
  - [x] [Multi-task attribute model](https://huggingface.co/3loi/SER-Odyssey-Baseline-WavLM-Multi-Attributes)   -  [Emotional Attributes Prediction App/Demo](https://huggingface.co/spaces/3loi/WavLM-SER-Multi-Baseline-Odyssey2024)


## Training and Evaluation

To train or evaluate the models, use the provided shell scripts. Here's how to use each script:

- `bash run_cat.sh`: Trains or evaluates the categorical emotion recognition baseline.
- `bash run_dim.sh`: Trains or evaluates the multi-task emotional attributes prediction baseline.


### Models

The models are to be saved in the `model` folder. If you are evaluating the pretrained models, please download the models using the script provided in the `model` folder at `./model/download_models.sh`. 
  ```
  $ bash download_models.sh 
  ```
- Example 1: `bash download_models.sh all` to download all the models.
- Example 2: `bash download_models.sh categorical` to download categorical model only.

If you wish to manually download a pre-trained model, please visit this [website](https://lab-msp.com/MSP-Podcast_Competition/IS2025/models/) and download the desired model and place them in the `model` folder. 

Pre-trained models file descriptions:
- "cat_ser.zip" --> Categorical emotion recognition baseline.
- "dim_ser.zip" --> Multi-task emotional attributes baseline.



### Evaluation Only

If you are only evaluating a model and do not wish to train it, comment out the lines related to `python train_eval_files/train_****_.py` in the respective run `.sh` file.

### Evaluation and saving results for emotional attributes prediction

A custom executable sample file for evaluation and results saving has been provided `./train_eval_files/eval_dim_ser.py`. To execute, just download or train the multi-task emotional attributes baseline. The script `bash run_dim.sh` already has execution code to run this set up (Note: If you will not be training the entire model again, please comment out the training lines in `run_dim.sh` before evaluation). The results will be saved in the correct `.csv` format and it will be located in a `results` folder created inside your model path location.

## Issues

If you encounter any issues while setting up, training, or evaluating the models, please open an issue on this repository with a detailed description of the problem.

---------------------------
To cite this repository in your works, use the following BibTeX entry:

```
@inproceedings{Goncalves_2024,
  title     = {Odyssey 2024 - Speech Emotion Recognition Challenge: Dataset, Baseline Framework, and Results},
  author    = {Lucas Goncalves and Ali N. Salman and Abinay Reddy Naini and Laureano Moro-Velzquez and Thomas Thebaud and Paola Garcia and Najim Dehak and Berrak Sisman and Carlos Busso},
  year      = {2024},
  booktitle = {The Speaker and Language Recognition Workshop (Odyssey 2024)},
  pages     = {247--254},
  doi       = {10.21437/odyssey.2024-35},
}
```

Owner

  • Name: Bagus Tris Atmaja
  • Login: bagustris
  • Kind: user
  • Location: Tsukuba
  • Company: AIST

Researcher @aistairc @VibrasticLab

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1