https://github.com/bagustris/multimodal-speech-emotion
[IEEE SLT-18] Multimodal Speech Emotion Recognition using Audio and Text
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, springer.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
[IEEE SLT-18] Multimodal Speech Emotion Recognition using Audio and Text
Basic Info
Statistics
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of david-yoon/multimodal-speech-emotion
Created over 7 years ago
· Last pushed over 7 years ago
https://github.com/bagustris/multimodal-speech-emotion/blob/master/
# multimodal-speech-emotion ## This repository contains the source code used in the following paper, **Multimodal Speech Emotion Recognition using Audio and Text**, IEEE SLT-18, [paper] ---------- ### [requirements] tensorflow==1.4 (tested on cuda-8.0, cudnn-6.0) python==2.7 scikit-learn==0.20.0 nltk==3.3 ### [download data corpus] - IEMOCAP [link] [paper] - download IEMOCAP data from its original web-page (license agreement is required) ### [preprocessed-data schema (our approach)] - for the preprocessing, refer to codes in the "./preprocessing" - If you want to download the "preprocessed corpus" from us directly, please send us an email after getting the license from IEMOCAP team. - We cannot publish ASR-processed transcription due to the license issue (commercial API), however, we assume that it is moderately easy to extract ASR-transcripts from the audio signal by oneself. (we used google-cloud-speech-api) - Examples > MFCC : MFCC features of the audio signal (ex. train_audio_mfcc.npy)
> MFCC-SEQN : valid lenght of the sequence of the audio signal (ex. train_seqN.npy)
> PROSODY : prosody features of the audio signal (ex. train_audio_prosody.npy)
> LABEL : targe label of the audio signal (ex. train_label.npy)
> TRANS : sequences of trasnciption (indexed) of a data (ex. train_nlp_trans.npy)
### [source code] - repository contains code for following models > Audio Recurrent Encoder (ARE)
> Text Recurrent Encoder (TRE)
> Multimodal Dual Recurrent Encoder (MDRE)
> Multimodal Dual Recurrent Encoder with Attention (MDREA)
---------- ### [training] - refer "reference_script.sh" - fianl result will be stored in "./TEST_run_result.txt"
---------- ### [cite] - Please cite our paper, when you use our code | model | dataset > @article{yoon2018multimodal,
> title={Multimodal Speech Emotion Recognition Using Audio and Text},
> author={Yoon, Seunghyun and Byun, Seokhyun and Jung, Kyomin},
> journal={arXiv preprint arXiv:1810.04635},
> year={2018}
> }
Owner
- Name: Bagus Tris Atmaja
- Login: bagustris
- Kind: user
- Location: Tsukuba
- Company: AIST
- Website: http://www.bagustris.blogspot.com
- Twitter: btatmaja
- Repositories: 221
- Profile: https://github.com/bagustris
Researcher @aistairc @VibrasticLab