Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: chiragmiyy
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 17.2 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 8 months ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

🎤 Speech Emotion Recognition

A machine learning and deep learning-based system for recognizing emotions from speech using audio features like MFCCs, Spectrograms, and more.

📌 Overview

This project implements a Speech Emotion Recognition (SER) pipeline that uses audio signal processing and classification algorithms to detect emotions from speech. It supports multiple datasets, feature extractors, classifiers, and evaluation metrics.


🧠 Supported Emotions

  • Neutral
  • Calm
  • Happy
  • Sad
  • Angry
  • Fearful
  • Disgust
  • Pleasant Surprise
  • Boredom

🛠️ Features

  • 🔉 Extracts audio features (MFCC, Chromagram, Spectrogram, etc.)
  • 🤖 Classifiers: SVC, RandomForest, GradientBoosting, KNeighbors, MLP, RNN
  • 🧪 Hyperparameter tuning via GridSearchCV
  • 📊 Evaluation: Accuracy, Confusion Matrix
  • 💾 Model saving & loading (.pkl)
  • 🔍 Dataset support: RAVDESS, TESS, EMO-DB, Custom

📦 Tech Stack

| Domain | Tools | |--------|-------| | Programming | Python | | Audio Processing | Librosa, OpenSMILE | | Machine Learning | Scikit-learn | | Deep Learning | PyTorch, HuggingFace Transformers (Wav2Vec2) | | Deployment (Optional) | Firebase Functions, Streamlit, Gradio |


📁 Project Structure

bash speech-emotion-recognition/ ├── data/ # Raw and processed audio files, organized by dataset │ ├── RAVDESS/ │ ├── TESS/ │ ├── CREMA-D/ │ └── custom/ # Your own audio recordings │ ├── models/ # Trained models & preprocessed data │ ├── final_model.pkl │ ├── scaler.pkl │ ├── label_encoder.pkl │ ├── tess-model.pkl │ └── tess-label-encoder.pkl # Any .joblib or .pt files │ ├── results/ # Visual outputs │ ├── confusion_matrix.png │ ├── model_accuracy_comparison.png │ ├── src/ # Source code │ └──features.py # Feature extraction scripts ├── train_final_model.py # Training & evaluation logic ├── app.py # Streamlit-based app to demo emotion predictions ├── .gitattributes # Optional: Git LFS or text encoding rules ├── CITATION.cff # Software citation metadata (you-only version) ├── LICENSE # MIT License (under your name) ├── README.md # Main project overview and usage ├── requirements.txt # Python dependencies ├── streamlit_app.py # App interface for demo/testing ├── plot_benchmarks.py # Script to generate accuracy and confusion matrix plots


🚀 Getting Started

  1. Clone the repo

git clone https://github.com/chirgamiyy/speech-emotion-recognition.git cd speech-emotion-recognition

  1. Install dependencies

pip install -r requirements.txt

  1. Run training or prediction

python src/train.py # Train model python src/predict.py # Predict emotion from audio


📊 Example Results

🔹 Model Accuracy Comparison (93.96%)

Model Accuracy

🔹 Confusion Matrix (on Combined Dataset)

Confusion Matrix


📜 License

This project is licensed under the MIT License.


🙌 Acknowledgements

If you build upon this work, please consider citing it via the CITATION.cff file.


📚 Citation

If you use this work, please cite it using the metadata in CITATION.cff.

bibtex @software{agrawal_2025_ser, author = {Chirag Agrawal}, title = {Speech Emotion Recognition}, year = {2025}, version = {1.0.0}, url = {https://github.com/chirgamiyy/speech-emotion-recognition} } Feel free to ⭐ the repo if you found it helpful!

Owner

  • Name: Chirag Agrawal
  • Login: chiragmiyy
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite it as below.
authors:
  - family-names: Agrawal
    given-names: Chirag
    orcid: https://orcid.org/0000-0000-0000-0000
title: Speech Emotion Recognition
version: 1.0.0
date-released: 2025-07-04
keywords:
  - speech emotion recognition
  - audio analysis
  - MFCC
  - affective computing
  - machine learning
  - deep learning
  - emotion detection
abstract: >
  This repository presents a comprehensive Speech Emotion Recognition (SER) framework that employs various machine learning and deep learning techniques to accurately detect and classify human emotions from speech. The framework utilizes multiple datasets, including RAVDESS, TESS, CREMA-D, and a custom dataset, covering a diverse range of emotions such as neutral, calm, happy, sad, angry, fear, disgust, pleasant surprise, and boredom. Feature extraction is performed using widely adopted audio features such as MFCC, Chromagram, MEL Spectrogram, Spectral Contrast, and Tonnetz. The repository supports grid search for hyperparameter tuning and offers various classifiers and regressors such as SVC, RandomForest, GradientBoosting, KNeighbors, MLP, Bagging, and RNNs. The developed SER system demonstrates strong accuracy in emotion classification, making it a valuable tool for research and applications in affective computing.
repository-code: https://github.com/chirgamiyy/emotion-recognition-using-speech
license: MIT

GitHub Events

Total
  • Watch event: 1
  • Push event: 8
  • Create event: 4
Last Year
  • Watch event: 1
  • Push event: 8
  • Create event: 4

Dependencies

requirements.txt pypi
  • librosa ==0.6.3
  • matplotlib ==2.2.3
  • numpy *
  • pandas *
  • pyaudio ==0.2.11
  • scikit-learn ==0.24.2
  • soundfile ==0.9.0
  • tensorflow ==2.5.2
  • tqdm ==4.28.1
  • wave *