speech-utility-bioacoustics

On the utility of speech and audio foundation models for marmoset call analysis

https://github.com/idiap/speech-utility-bioacoustics

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary

Keywords

audio bio-acoustics representation-learning self-supervised-learning speech

Last synced: 6 months ago · JSON representation ·

Repository

On the utility of speech and audio foundation models for marmoset call analysis

Basic Info

Host: GitHub
Owner: idiap
License: gpl-3.0
Language: Python
Default Branch: main
Homepage:
Size: 761 KB

Statistics

Stars: 1
Watchers: 4
Forks: 0
Open Issues: 0
Releases: 0

Topics

audio bio-acoustics representation-learning self-supervised-learning speech

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

On the Utility of Speech and Audio Foundation Models for Marmoset Call Analysis

[Paper] [Slides]

[![python](https://img.shields.io/badge/-Python_3.9-blue?logo=python&logoColor=white)](https://github.com/pre-commit/pre-commit) [![pytorch](https://img.shields.io/badge/PyTorch_2.0+-ee4c2c?logo=pytorch&logoColor=white)](https://pytorch.org/get-started/locally/) [![lightning](https://img.shields.io/badge/-Lightning_2.0+-792ee5?logo=pytorchlightning&logoColor=white)](https://pytorchlightning.ai/) [![hydra](https://img.shields.io/badge/Config-Hydra_1.3-89b8cd)](https://hydra.cc/) [![black](https://img.shields.io/badge/Code%20Style-Black-black.svg?labelColor=gray)](https://black.readthedocs.io/en/stable/) [![isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/) [![license](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://github.com/idiap/speech-utility-bioacoustics/blob/main/LICENSE) [![license](https://img.shields.io/badge/GitHub-Open%20source-green)](tps://github.com/speech-utility-bioacoustics/)

header

Cite

This repository contains the source code for the ISCA Interspeech 2024 paper On the Utility of Speech and Audio Foundation Models for Marmoset Call Analysis by E. Sarkar and M. Magimai Doss (2024). It was accepted at the 4th International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots (VIHAR 2024) workshop track.

Please cite the original authors for their work in any publication(s) that uses this work:

bib @inproceedings{sarkar24_vihar, title = {On the Utility of Speech and Audio Foundation Models for Marmoset Call Analysis}, author = {Eklavya Sarkar and Mathew Magimai.-Doss}, year = {2024}, booktitle = {4th International Workshop on Vocal Interactivity In-and-between Humans, Animals and Robots (VIHAR2024)}, doi = {10.5281/zenodo.13935495}, isbn = {978-2-9562029-3-6}, }

Dataset

InfantMarmosetsVox is a dataset for multi-class call-type and caller identification. It contains audio recordings of different individual marmosets and their call-types. The dataset contains a total of 350 files of precisely labelled 10-minute audio recordings across all caller classes. The audio was recorded from five pairs of infant marmoset twins, each recorded individually in two separate sound-proofed recording rooms at a sampling rate of 44.1 kHz. The start and end time, call-type, and marmoset identity of each vocalization are provided, labeled by an experienced researcher. It contains a total of 169,318 labeled audio segments, which amounts to 72,921 vocalization segments once removing the "Silence" and "Noise" classes. There are 11 different call-types (excluding "Silence" and "Noise") and 10 different caller identities.

The dataset is publicly available here, and contains a usable Pytorch Dataset and Dataloader. Any publication (eg. conference paper, journal article, technical report, book chapter, etc) resulting from the usage of InfantsMarmosetVox must cite this paper:

bib @inproceedings{sarkar23_interspeech, title = {Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?}, author = {Eklavya Sarkar and Mathew Magimai.-Doss}, year = {2023}, booktitle = {INTERSPEECH 2023}, pages = {1189--1193}, doi = {10.21437/Interspeech.2023-1968}, issn = {2958-1796}, }

More information on the usage is provided in the README.txt file of the dataset.

Installation

This package has very few requirements. To create a new conda/mamba environment, install conda, then mamba, and simply follow the next steps:

```bash

Clone project

git clone https://github.com/idiap/speech-utility-bioacoustics cd speech-utility-bioacoustics

Create and activate environment

mamba env create -f environment.yml mamba activate marmosets ```

Usage

Train model with chosen experiment configuration from configs/experiment/

bash python src/train.py experiment=experiment_name.yaml

You can override any parameter from command line like this

bash python src/train.py trainer.max_epochs=20

Experiments

The experiments conducted in this paper can be found in the scripts folder. These contain feature extraction, pairwise distance computation, and training scripts.

Sample run:

bash $ ./scripts/train/wavlm.sh

These use gridtk but can be reconfigured according to the user's needs.

Directory Structure

The structure of this directory is organized as the following:

``` . ├── CITATION.cff # Setup ├── configs # Experiment configs ├── environment.yaml # Environment file ├── hydra_plugins # Plugins ├── img # Images ├── LICENSE # License ├── Makefile # Setup ├── MANIFEST.in # Setup ├── pyproject.toml # Setup ├── README.md # This file ├── requirements.txt # Requirements ├── scripts # Scripts ├── setup.py # Setup ├── src # Python source code └── version.txt # Version

```

Contact

For questions or reporting issues to this software package, kindly contact the first author.

Owner

Name: Idiap Research Institute
Login: idiap
Kind: organization
Location: Centre du Parc, Martigny, Switzerland

Website: http://www.idiap.ch
Repositories: 73
Profile: https://github.com/idiap

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Sarkar
  given-names: Eklavya
- family-names: Magimai.-Doss
  given-names: Mathew
title: On the Utility of Speech and Audio Foundation Models for Marmoset Call Analysis
doi: 
version: v0.1.0
date-released: 2023-07-23

GitHub Events

Total

Push event: 2

Last Year

Push event: 2

Dependencies

.github/workflows/code-quality-main.yaml actions

actions/checkout v2 composite
actions/setup-python v2 composite
pre-commit/action v2.0.3 composite

.github/workflows/code-quality-pr.yaml actions

actions/checkout v2 composite
actions/setup-python v2 composite
pre-commit/action v2.0.3 composite
trilom/file-changes-action v1.2.4 composite

.github/workflows/release-drafter.yml actions

release-drafter/release-drafter v5 composite

.github/workflows/test.yml actions

actions/checkout v3 composite
actions/checkout v2 composite
actions/setup-python v3 composite
actions/setup-python v2 composite
codecov/codecov-action v3 composite

environment.yaml pypi

hydra-colorlog *
hydra-optuna-sweeper *
ipdb *
torchlibrosa *
transformers *

pyproject.toml pypi

requirements.txt pypi

hydra-colorlog ==1.2.0
hydra-core ==1.3.2
hydra-optuna-sweeper ==1.2.0
lightning >=2.0.0
pre-commit *
pytest *
rich *
rootutils *
torch >=2.0.0
torchmetrics >=0.11.4
torchvision >=0.15.0

setup.py pypi

lightning *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

speech-utility-bioacoustics

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

On the Utility of Speech and Audio Foundation Models for Marmoset Call Analysis

Cite

Dataset

Installation

Clone project

Create and activate environment

Usage

Experiments

Directory Structure

Contact

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies