speech-emotion-recognition

https://github.com/chiragmiyy/speech-emotion-recognition

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.3%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: chiragmiyy
License: mit
Language: Python
Default Branch: main
Size: 17.2 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created 11 months ago · Last pushed 11 months ago

Metadata Files

Readme License Citation

🎤 Speech Emotion Recognition

A machine learning and deep learning-based system for recognizing emotions from speech using audio features like MFCCs, Spectrograms, and more.

📌 Overview

This project implements a Speech Emotion Recognition (SER) pipeline that uses audio signal processing and classification algorithms to detect emotions from speech. It supports multiple datasets, feature extractors, classifiers, and evaluation metrics.

🧠 Supported Emotions

Neutral
Calm
Happy
Sad
Angry
Fearful
Disgust
Pleasant Surprise
Boredom

🛠️ Features

🔉 Extracts audio features (MFCC, Chromagram, Spectrogram, etc.)
🤖 Classifiers: SVC, RandomForest, GradientBoosting, KNeighbors, MLP, RNN
🧪 Hyperparameter tuning via GridSearchCV
📊 Evaluation: Accuracy, Confusion Matrix
💾 Model saving & loading (.pkl)
🔍 Dataset support: RAVDESS, TESS, EMO-DB, Custom

📦 Tech Stack

| Domain | Tools | |--------|-------| | Programming | Python | | Audio Processing | Librosa, OpenSMILE | | Machine Learning | Scikit-learn | | Deep Learning | PyTorch, HuggingFace Transformers (Wav2Vec2) | | Deployment (Optional) | Firebase Functions, Streamlit, Gradio |

📁 Project Structure

bash speech-emotion-recognition/ ├── data/ # Raw and processed audio files, organized by dataset │ ├── RAVDESS/ │ ├── TESS/ │ ├── CREMA-D/ │ └── custom/ # Your own audio recordings │ ├── models/ # Trained models & preprocessed data │ ├── final_model.pkl │ ├── scaler.pkl │ ├── label_encoder.pkl │ ├── tess-model.pkl │ └── tess-label-encoder.pkl # Any .joblib or .pt files │ ├── results/ # Visual outputs │ ├── confusion_matrix.png │ ├── model_accuracy_comparison.png │ ├── src/ # Source code │ └──features.py # Feature extraction scripts ├── train_final_model.py # Training & evaluation logic ├── app.py # Streamlit-based app to demo emotion predictions ├── .gitattributes # Optional: Git LFS or text encoding rules ├── CITATION.cff # Software citation metadata (you-only version) ├── LICENSE # MIT License (under your name) ├── README.md # Main project overview and usage ├── requirements.txt # Python dependencies ├── streamlit_app.py # App interface for demo/testing ├── plot_benchmarks.py # Script to generate accuracy and confusion matrix plots

🚀 Getting Started

Clone the repo

git clone https://github.com/chirgamiyy/speech-emotion-recognition.git cd speech-emotion-recognition

Install dependencies

pip install -r requirements.txt

Run training or prediction

python src/train.py # Train model python src/predict.py # Predict emotion from audio

📊 Example Results

🔹 Model Accuracy Comparison (93.96%)

Model Accuracy

🔹 Confusion Matrix (on Combined Dataset)

Confusion Matrix

📜 License

This project is licensed under the MIT License.

🙌 Acknowledgements

Audio Datasets:
Feature Extraction Libraries:
- Librosa
- OpenSMILE Toolkit
Machine Learning & Deep Learning:

If you build upon this work, please consider citing it via the CITATION.cff file.

📚 Citation

If you use this work, please cite it using the metadata in CITATION.cff.

bibtex @software{agrawal_2025_ser, author = {Chirag Agrawal}, title = {Speech Emotion Recognition}, year = {2025}, version = {1.0.0}, url = {https://github.com/chirgamiyy/speech-emotion-recognition} } Feel free to ⭐ the repo if you found it helpful!

Owner

Name: Chirag Agrawal
Login: chiragmiyy
Kind: user

Repositories: 1
Profile: https://github.com/chiragmiyy

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite it as below.
authors:
  - family-names: Agrawal
    given-names: Chirag
    orcid: https://orcid.org/0000-0000-0000-0000
title: Speech Emotion Recognition
version: 1.0.0
date-released: 2025-07-04
keywords:
  - speech emotion recognition
  - audio analysis
  - MFCC
  - affective computing
  - machine learning
  - deep learning
  - emotion detection
abstract: >
  This repository presents a comprehensive Speech Emotion Recognition (SER) framework that employs various machine learning and deep learning techniques to accurately detect and classify human emotions from speech. The framework utilizes multiple datasets, including RAVDESS, TESS, CREMA-D, and a custom dataset, covering a diverse range of emotions such as neutral, calm, happy, sad, angry, fear, disgust, pleasant surprise, and boredom. Feature extraction is performed using widely adopted audio features such as MFCC, Chromagram, MEL Spectrogram, Spectral Contrast, and Tonnetz. The repository supports grid search for hyperparameter tuning and offers various classifiers and regressors such as SVC, RandomForest, GradientBoosting, KNeighbors, MLP, Bagging, and RNNs. The developed SER system demonstrates strong accuracy in emotion classification, making it a valuable tool for research and applications in affective computing.
repository-code: https://github.com/chirgamiyy/emotion-recognition-using-speech
license: MIT

GitHub Events

Total

Watch event: 1
Push event: 8
Create event: 4

Last Year

Watch event: 1
Push event: 8
Create event: 4

Dependencies

requirements.txt pypi

librosa ==0.6.3
matplotlib ==2.2.3
numpy *
pandas *
pyaudio ==0.2.11
scikit-learn ==0.24.2
soundfile ==0.9.0
tensorflow ==2.5.2
tqdm ==4.28.1
wave *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science