laughter-detection-icsi

A ML-pipeline for training a laughter detection model on the ICSI corpus

https://github.com/lassewolter/laughter-detection-icsi

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: ieee.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary

Keywords

ai audio data deep-learning laughter lhotse machine-learning ml pytorch

Last synced: 6 months ago · JSON representation

Repository

A ML-pipeline for training a laughter detection model on the ICSI corpus

Basic Info

Host: GitHub
Owner: LasseWolter
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 157 MB

Statistics

Stars: 4
Watchers: 1
Forks: 1
Open Issues: 1
Releases: 1

Topics

ai audio data deep-learning laughter lhotse machine-learning ml pytorch

Created about 4 years ago · Last pushed about 3 years ago

Metadata Files

Readme License Citation

A Machine Learning Pipeline for Laughter Detection on the ICSI Corpus

This repo is based on the laughter detection model by Gillick et al. and retrains it on the ICSI Meeting corpus

The data pipeline uses Lhotse, a new Python library for speech and audio data preparation.

This repository consists of three main parts: 1. Evaluation Pipeline 2. Data Pipeline 3. Training Code

The following list outlines which parts of the repository belong to each of them and classifies the parts/files as one of three types: 1. from scratch: entirely written by myself 2. adapted: code taken from Gillick et al. and adapted 3. unmodified: code taken from Gillick et al. and not adapted or modified

Evalation Pipeline (from scratch):
- analysis
  - transcript_parsing/parse.py +preprocess.py: parsing and preprocessing the ICSI transcripts
  - analyse.py: main function, that parses and evaluates predictions from .TextGrid files output by the model
  - output_processing: scripts for creating .wav files for the laughter occurrences to manually evaluate them
- visualise.py: functions for visualising model performance (incl. prec-recall curve and confusion matrix)
Data Pipeline (from scratch) - also see diagram:
- compute_features: computes feature representing the whole corpus and specific subsets of the ICSI corpus
- create_data_df.py: creates a dataframe representing training, development and test-set
Training Code:
- models.py (unmodified): defines the model architecture
- train.py (adapted): main training code
- segment_laughter.py + laugh_segmenter.py (adpated): inference code to run laughter detection on audio files
- datasets.py + load_data.py (from scratch): the new LAD (Laugh Activity Detection) Dataset + new inference Dataset and code for their creation
Misc:
- Demo.ipynb (from scratch): demonstration of using Lhotse to compute features from a dataframe defining laughter and non-laughter segments
- config.py (adapted): configurations for different parts of the pipeline
- results.zip (N/A): contains the model predictions from experiments presented in my thesis ### Diagram of the Data Pipeline

Getting started

Steps to get the environment setup from scratch such that training and evaluation can be run

Clone this repo
cd into the repo
create a python env and install all packages listed below. Put them in a requirments.txt file and run pip install -r requirments.txt
we use Lhotse's available recipe for the ICSI-corpus to download the corpus' audio + transcripts
- run the python script get_icsi_data.py
  - this will take a while to complete - it downloads all audio and transcriptions for the icsi corpus
  - after completion
  - you should have a data/icsi/speech folder with all your audio files grouped by meeting
  - you should have a data/icsi/transcripts folder with all the .mrt transcripts
Now create a .env file by copying the .sample.env-file to an .env file.
- you can configure the folders to match your desired folder structure
Now you run compute_features.py once to compute the features for the whole corpus
- the first time this will parse the transcripts and create indices with laughter and non-laughter segments (see Other documentation section below).
- This will take a while (e.g. it took one hour for me)
  - after initial creation the indices are cached and they are loaded from disk
- that's done by the compute_features_per_split() method in the main() function
- you can comment out the call to compute_features_for_cuts() in the main() function if you just want to create the features for the whole corpus for now
Then you run create_data_df to create a set of training samples
Then you need to run compute_features.py to create the cutset
- this is done by the compute_features_for_cuts() function in the main() function

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

laughter-detection-icsi

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

A Machine Learning Pipeline for Laughter Detection on the ICSI Corpus

Getting started

Other documentation

analysis-folder:

Owner

GitHub Events

Total

Last Year

Dependencies