speech-emotion-recognition
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.3%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: chiragmiyy
- License: mit
- Language: Python
- Default Branch: main
- Size: 17.2 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
🎤 Speech Emotion Recognition
A machine learning and deep learning-based system for recognizing emotions from speech using audio features like MFCCs, Spectrograms, and more.
📌 Overview
This project implements a Speech Emotion Recognition (SER) pipeline that uses audio signal processing and classification algorithms to detect emotions from speech. It supports multiple datasets, feature extractors, classifiers, and evaluation metrics.
🧠 Supported Emotions
- Neutral
- Calm
- Happy
- Sad
- Angry
- Fearful
- Disgust
- Pleasant Surprise
- Boredom
🛠️ Features
- 🔉 Extracts audio features (MFCC, Chromagram, Spectrogram, etc.)
- 🤖 Classifiers: SVC, RandomForest, GradientBoosting, KNeighbors, MLP, RNN
- 🧪 Hyperparameter tuning via GridSearchCV
- 📊 Evaluation: Accuracy, Confusion Matrix
- 💾 Model saving & loading (
.pkl) - 🔍 Dataset support: RAVDESS, TESS, EMO-DB, Custom
📦 Tech Stack
| Domain | Tools | |--------|-------| | Programming | Python | | Audio Processing | Librosa, OpenSMILE | | Machine Learning | Scikit-learn | | Deep Learning | PyTorch, HuggingFace Transformers (Wav2Vec2) | | Deployment (Optional) | Firebase Functions, Streamlit, Gradio |
📁 Project Structure
bash
speech-emotion-recognition/
├── data/ # Raw and processed audio files, organized by dataset
│ ├── RAVDESS/
│ ├── TESS/
│ ├── CREMA-D/
│ └── custom/ # Your own audio recordings
│
├── models/ # Trained models & preprocessed data
│ ├── final_model.pkl
│ ├── scaler.pkl
│ ├── label_encoder.pkl
│ ├── tess-model.pkl
│ └── tess-label-encoder.pkl # Any .joblib or .pt files
│
├── results/ # Visual outputs
│ ├── confusion_matrix.png
│ ├── model_accuracy_comparison.png
│
├── src/ # Source code
│ └──features.py # Feature extraction scripts
├── train_final_model.py # Training & evaluation logic
├── app.py # Streamlit-based app to demo emotion predictions
├── .gitattributes # Optional: Git LFS or text encoding rules
├── CITATION.cff # Software citation metadata (you-only version)
├── LICENSE # MIT License (under your name)
├── README.md # Main project overview and usage
├── requirements.txt # Python dependencies
├── streamlit_app.py # App interface for demo/testing
├── plot_benchmarks.py # Script to generate accuracy and confusion matrix plots
🚀 Getting Started
- Clone the repo
git clone https://github.com/chirgamiyy/speech-emotion-recognition.git cd speech-emotion-recognition
- Install dependencies
pip install -r requirements.txt
- Run training or prediction
python src/train.py # Train model python src/predict.py # Predict emotion from audio
📊 Example Results
🔹 Model Accuracy Comparison (93.96%)

🔹 Confusion Matrix (on Combined Dataset)

📜 License
This project is licensed under the MIT License.
🙌 Acknowledgements
Audio Datasets:
Feature Extraction Libraries:
Machine Learning & Deep Learning:
If you build upon this work, please consider citing it via the
CITATION.cfffile.
📚 Citation
If you use this work, please cite it using the metadata in CITATION.cff.
bibtex
@software{agrawal_2025_ser,
author = {Chirag Agrawal},
title = {Speech Emotion Recognition},
year = {2025},
version = {1.0.0},
url = {https://github.com/chirgamiyy/speech-emotion-recognition}
}
Feel free to ⭐ the repo if you found it helpful!
Owner
- Name: Chirag Agrawal
- Login: chiragmiyy
- Kind: user
- Repositories: 1
- Profile: https://github.com/chiragmiyy
Citation (CITATION.cff)
cff-version: 1.2.0
message: If you use this software, please cite it as below.
authors:
- family-names: Agrawal
given-names: Chirag
orcid: https://orcid.org/0000-0000-0000-0000
title: Speech Emotion Recognition
version: 1.0.0
date-released: 2025-07-04
keywords:
- speech emotion recognition
- audio analysis
- MFCC
- affective computing
- machine learning
- deep learning
- emotion detection
abstract: >
This repository presents a comprehensive Speech Emotion Recognition (SER) framework that employs various machine learning and deep learning techniques to accurately detect and classify human emotions from speech. The framework utilizes multiple datasets, including RAVDESS, TESS, CREMA-D, and a custom dataset, covering a diverse range of emotions such as neutral, calm, happy, sad, angry, fear, disgust, pleasant surprise, and boredom. Feature extraction is performed using widely adopted audio features such as MFCC, Chromagram, MEL Spectrogram, Spectral Contrast, and Tonnetz. The repository supports grid search for hyperparameter tuning and offers various classifiers and regressors such as SVC, RandomForest, GradientBoosting, KNeighbors, MLP, Bagging, and RNNs. The developed SER system demonstrates strong accuracy in emotion classification, making it a valuable tool for research and applications in affective computing.
repository-code: https://github.com/chirgamiyy/emotion-recognition-using-speech
license: MIT
GitHub Events
Total
- Watch event: 1
- Push event: 8
- Create event: 4
Last Year
- Watch event: 1
- Push event: 8
- Create event: 4
Dependencies
- librosa ==0.6.3
- matplotlib ==2.2.3
- numpy *
- pandas *
- pyaudio ==0.2.11
- scikit-learn ==0.24.2
- soundfile ==0.9.0
- tensorflow ==2.5.2
- tqdm ==4.28.1
- wave *