https://github.com/bagustris/s3prl-ser
S3PRL for Speech Emotion Recognition (see s3prl > downstream)
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: ieee.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.9%) to scientific vocabulary
Keywords
Repository
S3PRL for Speech Emotion Recognition (see s3prl > downstream)
Basic Info
Statistics
- Stars: 15
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
S3PRL-SER
S3PRL for Speech Emotion Recognition. See s3prl > downstream for supported speech emotion datasets.
Environment compatibilities
We support the following environments. The test cases are ran with tox locally and on github action:
| Env | versions |
| --- | --- |
| os | ubuntu-18.04, ubuntu-20.04 |
| python | 3.7, 3.8, 3.9, 3.10 |
| pytorch | 1.13.1 |
Supported SER datasets (Status, WA, UA)
- CMU-MOSEI (done, 0.65, 0.24)
- IEMOCAP (in-progress, 0.73, 0.71)
- MSP-IMPROV (in-progress, 0.67, 0.64)
- MSP-Podcast (in progress, 0.71, 0.54)
- JTES (in-progress, 0.78, 0.78)
- EmoFilm (in-progress, 0.XX, 0.XX)
- AESDD (planned)
- CaFE (planned)
- SAVEE (planned)
Introduction and Usages
This is an open source toolkit called s3prl-ser, which stands for Self-Supervised Speech Pre-training and Representation Learning for Speech Emotion Recognition. Self-supervised speech pre-trained models are called upstream in this toolkit, and are utilized in various downstream tasks.
Unlike the original S3PRL, the S3PRL-SER has a single usage on Downstream:
Downstream
- Utilize upstream models in lots of downstream tasks
- Benchmark upstream models with SUPERB Benchmark
- Document: downstream/README.md
Please refer to the original S3PRL repository if you want to experiment with Pre-train and Upstream usages.
Below is an intuitive illustration on how this toolkit may help you:
\
\
\
\
Feel free to use or modify our toolkit in your research. Here is a list of papers using our toolkit. Any question, bug report or improvement suggestion is welcome through opening up a new issue.
If you find this toolkit helpful to your research, please do consider citing our papers, thanks!
Installation
- Python >= 3.8
- Install sox on your OS
- Install s3prl: Read doc or
pip install -e ".[all]" - (Optional) Some upstream models require special dependencies. If you encounter error with a specific upstream model, you can look into the
README.mdunder eachupstreamfolder. E.g.,upstream/pase/README.md
Development pattern for contributors
- Create a personal fork of the main S3PRL repository in GitHub.
- Make your changes in a named branch different from
master, e.g. you create a branchnew-awesome-feature. - Contact us if you have any questions during development.
- Generate a pull request through the Web interface of GitHub.
- Please verify that your code is free of basic mistakes, we appreciate any contribution!
Reference Repositories
- Pytorch, Pytorch.
- Audio, Pytorch.
- Kaldi, Kaldi-ASR.
- Transformers, Hugging Face.
- PyTorch-Kaldi, Mirco Ravanelli.
- fairseq, Facebook AI Research.
- CPC, Facebook AI Research.
- APC, Yu-An Chung.
- VQ-APC, Yu-An Chung.
- NPC, Alexander-H-Liu.
- End-to-end-ASR-Pytorch, Alexander-H-Liu
- Mockingjay, Andy T. Liu.
- ESPnet, Shinji Watanabe
- speech-representations, aws lab
- PASE, Santiago Pascual and Mirco Ravanelli
- LibriMix, Joris Cosentino and Manuel Pariente
License
The majority of S3PRL Toolkit is licensed under the Apache License version 2.0, however all the files authored by Facebook, Inc. (which have explicit copyright statement on the top) are licensed under CC-BY-NC.
Citation
If you find this toolkit useful, please consider citing following papers.
``` @article{Atmaja2022h, author = {Atmaja, Bagus Tris and Sasou, Akira}, doi = {10.1109/ACCESS.2022.3225198}, issn = {2169-3536}, journal = {IEEE Access}, pages = {124396--124407}, title = {{Evaluating Self-Supervised Speech Representations for Speech Emotion Recognition}}, url = {https://ieeexplore.ieee.org/document/9964237/}, volume = {10}, year = {2022} }
@inproceedings{yang21c_interspeech, author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee}, title={{SUPERB: Speech Processing Universal PERformance Benchmark}}, year=2021, booktitle={Proc. Interspeech 2021}, pages={1194--1198}, doi={10.21437/Interspeech.2021-1775} } ```
Owner
- Name: Bagus Tris Atmaja
- Login: bagustris
- Kind: user
- Location: Tsukuba
- Company: AIST
- Website: http://www.bagustris.blogspot.com
- Twitter: btatmaja
- Repositories: 221
- Profile: https://github.com/bagustris
Researcher @aistairc @VibrasticLab
GitHub Events
Total
- Watch event: 3
- Push event: 2
Last Year
- Watch event: 3
- Push event: 2
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Bagus Tris Atmaja | b****s@y****m | 26 |
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0