https://github.com/bagustris/multimodal-speech-emotion

[IEEE SLT-18] Multimodal Speech Emotion Recognition using Audio and Text

https://github.com/bagustris/multimodal-speech-emotion

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, springer.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

[IEEE SLT-18] Multimodal Speech Emotion Recognition using Audio and Text

Basic Info
  • Host: GitHub
  • Owner: bagustris
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage:
  • Size: 223 KB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of david-yoon/multimodal-speech-emotion
Created over 7 years ago · Last pushed over 7 years ago

https://github.com/bagustris/multimodal-speech-emotion/blob/master/

# multimodal-speech-emotion


## This repository contains the source code used in the following paper,

**Multimodal Speech Emotion Recognition using Audio and Text**, IEEE SLT-18, [paper]

----------

### [requirements]
	tensorflow==1.4 (tested on cuda-8.0, cudnn-6.0)
	python==2.7
	scikit-learn==0.20.0
	nltk==3.3


### [download data corpus]
- IEMOCAP [link]
[paper]
- download IEMOCAP data from its original web-page (license agreement is required)


### [preprocessed-data schema (our approach)]
- for the preprocessing, refer to codes in the "./preprocessing"
- If you want to download the "preprocessed corpus" from us directly, please send us an email after getting the license from IEMOCAP team.
- We cannot publish ASR-processed transcription due to the license issue (commercial API), however, we assume that it is moderately easy to extract ASR-transcripts from the audio signal by oneself. (we used google-cloud-speech-api)
- Examples
	> MFCC : MFCC features of the audio signal (ex. train_audio_mfcc.npy) 
> MFCC-SEQN : valid lenght of the sequence of the audio signal (ex. train_seqN.npy)
> PROSODY : prosody features of the audio signal (ex. train_audio_prosody.npy)
> LABEL : targe label of the audio signal (ex. train_label.npy)
> TRANS : sequences of trasnciption (indexed) of a data (ex. train_nlp_trans.npy)
### [source code] - repository contains code for following models > Audio Recurrent Encoder (ARE)
> Text Recurrent Encoder (TRE)
> Multimodal Dual Recurrent Encoder (MDRE)
> Multimodal Dual Recurrent Encoder with Attention (MDREA)
---------- ### [training] - refer "reference_script.sh" - fianl result will be stored in "./TEST_run_result.txt"
---------- ### [cite] - Please cite our paper, when you use our code | model | dataset > @article{yoon2018multimodal,
> title={Multimodal Speech Emotion Recognition Using Audio and Text},
> author={Yoon, Seunghyun and Byun, Seokhyun and Jung, Kyomin},
> journal={arXiv preprint arXiv:1810.04635},
> year={2018}
> }

Owner

  • Name: Bagus Tris Atmaja
  • Login: bagustris
  • Kind: user
  • Location: Tsukuba
  • Company: AIST

Researcher @aistairc @VibrasticLab

GitHub Events

Total
Last Year