https://github.com/bagustris/multimodal-speech-emotion

[IEEE SLT-18] Multimodal Speech Emotion Recognition using Audio and Text

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, springer.com
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

[IEEE SLT-18] Multimodal Speech Emotion Recognition using Audio and Text

Basic Info

Host: GitHub
Owner: bagustris
License: mit
Language: Jupyter Notebook
Default Branch: master
Homepage:
Size: 223 KB

Statistics

Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Fork of david-yoon/multimodal-speech-emotion

Created over 7 years ago · Last pushed over 7 years ago

https://github.com/bagustris/multimodal-speech-emotion/blob/master/

# multimodal-speech-emotion


## This repository contains the source code used in the following paper,

**Multimodal Speech Emotion Recognition using Audio and Text**, IEEE SLT-18, [paper]

----------

### [requirements]
	tensorflow==1.4 (tested on cuda-8.0, cudnn-6.0)
	python==2.7
	scikit-learn==0.20.0
	nltk==3.3


### [download data corpus]
- IEMOCAP [link]
[paper]
- download IEMOCAP data from its original web-page (license agreement is required)


### [preprocessed-data schema (our approach)]
- for the preprocessing, refer to codes in the "./preprocessing"
- If you want to download the "preprocessed corpus" from us directly, please send us an email after getting the license from IEMOCAP team.
- We cannot publish ASR-processed transcription due to the license issue (commercial API), however, we assume that it is moderately easy to extract ASR-transcripts from the audio signal by oneself. (we used google-cloud-speech-api)
- Examples
	> MFCC : MFCC features of the audio signal (ex. train_audio_mfcc.npy) 

	> MFCC-SEQN : valid lenght of the sequence of the audio signal (ex. train_seqN.npy)

	> PROSODY : prosody features of the audio signal (ex. train_audio_prosody.npy) 

	> LABEL : targe label of the audio signal (ex. train_label.npy) 
 
	> TRANS : sequences of trasnciption (indexed) of a data (ex. train_nlp_trans.npy) 



### [source code]
- repository contains code for following models
	 > Audio Recurrent Encoder (ARE) 

	 > Text Recurrent Encoder (TRE) 

	 > Multimodal Dual Recurrent Encoder (MDRE) 

	 > Multimodal Dual Recurrent Encoder with Attention (MDREA) 


----------

### [training]
- refer "reference_script.sh"
- fianl result will be stored in "./TEST_run_result.txt" 



----------


### [cite]
- Please cite our paper, when you use our code | model | dataset

  > @article{yoon2018multimodal, 

  >  title={Multimodal Speech Emotion Recognition Using Audio and Text}, 

  >  author={Yoon, Seunghyun and Byun, Seokhyun and Jung, Kyomin}, 

  >  journal={arXiv preprint arXiv:1810.04635}, 

  >  year={2018} 

  > }

Owner

Name: Bagus Tris Atmaja
Login: bagustris
Kind: user
Location: Tsukuba
Company: AIST

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/bagustris/multimodal-speech-emotion

Science Score: 23.0%

Repository

Basic Info

Statistics

https://github.com/bagustris/multimodal-speech-emotion/blob/master/

Owner

GitHub Events

Total

Last Year