ssl-caller-detection

Source code for the paper 'Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?' by E. Sarkar and M. Magimai Doss (2023).

https://github.com/idiap/ssl-caller-detection

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary

Keywords

bio-acoustics machine-learning representation-learning self-supervised-learning signal-processing

Last synced: 6 months ago · JSON representation ·

Repository

Source code for the paper 'Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?' by E. Sarkar and M. Magimai Doss (2023).

Basic Info

Host: GitHub
Owner: idiap
License: gpl-3.0
Language: Python
Default Branch: main
Homepage:
Size: 2.29 MB

Statistics

Stars: 6
Watchers: 5
Forks: 0
Open Issues: 0
Releases: 1

Topics

bio-acoustics machine-learning representation-learning self-supervised-learning signal-processing

Created over 2 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License Citation

Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?

[Paper] [Video] [Slides]

header

Cite

This repository contains the source code for the Interspeech accepted paper Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers? by E. Sarkar and M. Magimai Doss (2023).

Please cite the original authors for their work in any publication(s) that uses this work:

bib @inproceedings{sarkar23_interspeech, author={Eklavya Sarkar and Mathew Magimai.-Doss}, title={{Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?}}, year=2023, booktitle={Proc. INTERSPEECH 2023}, pages={1189--1193}, doi={10.21437/Interspeech.2023-1968} }

Dataset

InfantMarmosetsVox is a dataset for multi-class call-type and caller identification. It contains audio recordings of different individual marmosets and their call-types. The dataset contains a total of 350 files of precisely labelled 10-minute audio recordings across all caller classes. The audio was recorded from five pairs of infant marmoset twins, each recorded individually in two separate sound-proofed recording rooms at a sampling rate of 44.1 kHz. The start and end time, call-type, and marmoset identity of each vocalization are provided, labeled by an experienced researcher. It contains a total of 169,318 labeled audio segments, which amounts to 72,921 vocalization segments once removing the "Silence" and "Noise" classes. There are 11 different call-types (excluding "Silence" and "Noise") and 10 different caller identities.

The dataset is publicly available here, and contains a usable Pytorch Dataset and Dataloader. Any publication (eg. conference paper, journal article, technical report, book chapter, etc) resulting from the usage of InfantsMarmosetVox must cite this paper.

More information on the usage is provided in the README.txt file included in the dataset.

Installation

This package has very few requirements. To create a new conda/mamba environment, install conda, then mamba, and simply follow the next steps:

mamba env create -f environment.yml # Create env mamba activate marmosets # Activate env

Experiments

The following run compute the stated computations:

Preprocessing: - extract_features.py extracts SSL embeddings. - extract_baselines.py extracts handcrafted features. - embeddings2cid_pickles.py converts the variable-length features to fixed-length functionals.

Study I - Caller Discrimination Analysis: - functionals2distributions.py comptes the KL-divergence and Bhattacharya distance between extracted embeddings.

Study II - Caller Detection Study: - classifier_caller_groups.py classifies the functionals using a ML classifier (SVM, RF, AB). - compile_results.py compiles all the results computed from classifier_caller_groups.py.

Misc: - utils.py contains utility functions such as loading the SSL embeddings or SSL functionals.

Note that the protocols of experiments above are defined in marmoset_lists which contains the sets splits and other mappings in .pkl files.

Usage

The scripts above are independent, and need various parameters. To run any of the above experiments, see all the necessary requirements with:

bash python src/file.py -h

This will only run the permutation selected with the parameter variables. To run all the experiments one would have to run a grid search across all possible values. Note that the experiments in the paper were only run with the task -t marmosetID parameter.

Directory Structure

The structure of this directory is organized as the following:

. ├── dataset # Dataset config files ├── environment.yml # Environment file ├── img # Images ├── LICENSE # License ├── MANIFEST.in # Setup ├── marmoset_lists # Protocol lists and pickles ├── pkl # Pickles ├── pyproject.toml # Setup ├── README.md # This file ├── src # Python source code └── version.txt # Version

Contact

For questions or reporting issues to this software package, kindly contact the first author.

Owner

Name: Idiap Research Institute
Login: idiap
Kind: organization
Location: Centre du Parc, Martigny, Switzerland

Website: http://www.idiap.ch
Repositories: 73
Profile: https://github.com/idiap

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Sarkar
  given-names: Eklavya
- family-names: Magimai.-Doss
  given-names: Mathew
title: Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?
doi: 
version: v0.1.0
date-released: 2023-10-16

GitHub Events

Total

Watch event: 2

Last Year

Watch event: 2

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

environment.yml pypi

bob-extension ==7.0.2
docutils ==0.19
fire ==0.4.0
greenlet ==1.1.2
lightning-lite ==1.8.0
lightning-utilities ==0.3.0
pytorch-lightning ==1.7.7
regex ==2022.10.31
s3prl ==0.4.10
termcolor ==2.1.0
torch-summary ==1.4.5

pyproject.toml pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

ssl-caller-detection

Science Score: 57.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?

Cite

Dataset

Installation

Experiments

Usage

Directory Structure

Contact

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies