ser-llm

https://github.com/helda-3110/ser-llm

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: Helda-3110
Language: Python
Default Branch: main
Size: 5.27 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme Citation

Speak2Feel

Introduction

This repository handles building and training Speech Emotion Recognition System and infused LLM.
The basic idea behind this tool is to build and train/test a suited machine learning ( as well as deep learning ) algorithm that could recognize and detects human emotions from speech.

Install these required libraries by the following command: pip3 install -r requirements.txt

Dataset

This repository used 4 datasets (including this repo's custom dataset) which are downloaded and formatted already in data folder: - RAVDESS : The Ryson Audio-Visual Database of Emotional Speech and Song that contains 24 actors (12 male, 12 female), vocalizing two lexically-matched statements in a neutral North American accent. - TESS : Toronto Emotional Speech Set that was modeled on the Northwestern University Auditory Test No. 6 (NU-6; Tillman & Carhart, 1966). A set of 200 target words were spoken in the carrier phrase "Say the word ____' by two actresses (aged 26 and 64 years). - EMO-DB : As a part of the DFG funded research project SE462/3-1 in 1997 and 1999 we recorded a database of emotional utterances spoken by actors. The recordings took place in the anechoic chamber of the Technical University Berlin, department of Technical Acoustics. Director of the project was Prof. Dr. W. Sendlmeier, Technical University of Berlin, Institute of Speech and Communication, department of communication science. Members of the project were mainly Felix Burkhardt, Miriam Kienast, Astrid Paeschke and Benjamin Weiss. - Custom : Some unbalanced noisy dataset that is located in data/train-custom for training and data/test-custom for testing in which you can add/remove recording samples easily by converting the raw audio to 16000 sample rate, mono channel (this is provided in `createwavs.pyscript inconvert_audio(audio_path)` method which requires ffmpeg to be installed and in PATH) and adding the emotion to the end of audio file name separated with '' (e.g "20190616125714_happy.wav" will be parsed automatically as happy)

Emotions available

There are 9 emotions available: "neutral", "calm", "happy" "sad", "angry", "fear", "disgust", "ps" (pleasant surprise) and "boredom".

Feature Extraction

Feature extraction is the main part of the speech emotion recognition system. It is basically accomplished by changing the speech waveform to a form of parametric representation at a relatively lesser data rate.

In this repository, we have used the most used features that are available in librosa library including: - MFCC - Chromagram - MEL Spectrogram Frequency (mel) - Contrast - Tonnetz (tonal centroid features)

Owner

Login: Helda-3110
Kind: user

Repositories: 1
Profile: https://github.com/Helda-3110

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite it as below.
authors:
  - family-names: Abdeladim
    given-names: Fadheli
title: Speech Emotion Recognition
version: 1.0.0
date-released: 2019-04-28
abstract: "This repository presents a comprehensive SER framework that employs various machine learning and deep learning techniques to accurately detect and classify human emotions from speech. The framework utilizes four datasets, including RAVDESS, TESS, EMO-DB, and a custom dataset, comprising a diverse range of emotions such as neutral, calm, happy, sad, angry, fear, disgust, pleasant surprise, and boredom. Feature extraction is performed using widely adopted audio features, including MFCC, Chromagram, MEL Spectrogram Frequency, Contrast, and Tonnetz. The repository also supports grid search for hyperparameter tuning and offers a range of classifiers and regressors such as SVC, RandomForest, GradientBoosting, KNeighbors, MLP, Bagging, and Recurrent Neural Networks. The developed SER system demonstrates promising accuracy in emotion classification, making it a valuable tool for researchers and practitioners in the field of affective computing and related domains."
repository-code: https://github.com/x4nth055/emotion-recognition-using-speech
license: MIT

GitHub Events

Total

Push event: 8
Create event: 2

Last Year

Push event: 8
Create event: 2

Dependencies

Dockerfile docker

python 3.7-slim build

requirements.txt pypi

altair ==4.2.2
librosa ==0.6.3
llvmlite ==0.31.0
matplotlib ==3.3.4
numba ==0.48.0
numpy *
pandas *
pipwin ==0.5.2
protobuf ==3.19.6
pyaudio ==0.2.11
scikit-learn ==0.24.2
soundfile ==0.9.0
streamlit ==1.19.0
tensorflow ==2.10.1
torch ==1.13.1
tqdm ==4.28.1
transformers ==4.28.1
typing-extensions ==4.5.0
wave *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science