voice-based-emotion-recognition-using-ai-and-speech-processing

A deep learning project that detects human emotions from audio using MFCC feature extraction and a neural network classifier. Built using Python and TensorFlow, this tool leverages speech signal processing to predict emotions like happy, angry, or sad, helping machines better understand human sentiment.

https://github.com/captaincodercool/voice-based-emotion-recognition-using-ai-and-speech-processing

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A deep learning project that detects human emotions from audio using MFCC feature extraction and a neural network classifier. Built using Python and TensorFlow, this tool leverages speech signal processing to predict emotions like happy, angry, or sad, helping machines better understand human sentiment.

Basic Info
  • Host: GitHub
  • Owner: CAPTAINCODERCOOL
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Size: 12.4 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 10 months ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

🎙️ Voice-Based Emotion Recognition Using AI

This project focuses on recognizing human emotions from speech using deep learning techniques and signal processing. It combines MFCC-based feature extraction with a powerful neural network model to classify emotions like happy, sad, angry, and neutral from voice recordings.


🚀 Features

  • 🎧 Detects emotion from voice recordings
  • 🔍 MFCC feature extraction for audio signal processing
  • 🧠 Deep neural network for multi-class emotion classification
  • 🗂 Real-time audio classification & dataset preprocessing
  • 📊 Accuracy visualization using confusion matrices

🛠 Tech Stack

  • Python 3
  • TensorFlow / Keras
  • NumPy, Pandas
  • librosa
  • scikit-learn
  • Matplotlib, Seaborn

📦 Installation

  1. Clone this repository: ```bash git clone https://github.com/CAPTAINCODERCOOL/emotion-recognition-from-speech.git cd emotion-recognition-from-speech Install the required dependencies:

bash Copy Edit pip install -r requirements.txt (Optional) Use your own audio dataset, or rely on the pre-organized folders.

🎯 How It Works Loads WAV audio files from datasets

Extracts MFCCs (Mel-frequency cepstral coefficients)

Trains a neural network to classify emotions into:

Happy 😊

Angry 😠

Sad 😢

Fear 😨

Neutral 😐

📂 Project Structure bash Copy Edit emotion-recognition/ ├── audio/ # Sample audio files for testing ├── datasets/ # Training datasets (e.g., RAVDESS, SAVEE) ├── extract_features.py # MFCC + label extraction ├── model.py # Training and prediction model ├── predict.py # Predict emotion from new audio ├── requirements.txt └── README.md 🔬 Example Usage bash Copy Edit python model.py python predict.py --input audio/sample.wav 📊 Evaluation Confusion Matrix

Accuracy & loss plots

Precision, recall, F1-score

Test on unseen audio files

Future Enhancements Add real-time microphone input

Expand to multilingual datasets

Convert into a Streamlit Web App

Deploy via Flask for real-world applications

Dataset Link https://drive.google.com/drive/folders/1fTN2VkAcTycXCCXmQ71k5ioQmxTMoLm7?usp=sharing

📜 License This repository follows the original MIT License.

🌐 Connect With Me GitHub: CAPTAINCODERCOOL

LinkedIn: chiragpatil04

Email: chiragpatilprofessional@gmail.com

Owner

  • Login: CAPTAINCODERCOOL
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite it as below.
authors:
  - family-names: Abdeladim
    given-names: Fadheli
title: Speech Emotion Recognition
version: 1.0.0
date-released: 2019-04-28
abstract: "This repository presents a comprehensive SER framework that employs various machine learning and deep learning techniques to accurately detect and classify human emotions from speech. The framework utilizes four datasets, including RAVDESS, TESS, EMO-DB, and a custom dataset, comprising a diverse range of emotions such as neutral, calm, happy, sad, angry, fear, disgust, pleasant surprise, and boredom. Feature extraction is performed using widely adopted audio features, including MFCC, Chromagram, MEL Spectrogram Frequency, Contrast, and Tonnetz. The repository also supports grid search for hyperparameter tuning and offers a range of classifiers and regressors such as SVC, RandomForest, GradientBoosting, KNeighbors, MLP, Bagging, and Recurrent Neural Networks. The developed SER system demonstrates promising accuracy in emotion classification, making it a valuable tool for researchers and practitioners in the field of affective computing and related domains."
repository-code: https://github.com/x4nth055/emotion-recognition-using-speech
license: MIT

GitHub Events

Total
  • Push event: 3
  • Create event: 3
Last Year
  • Push event: 3
  • Create event: 3

Dependencies

requirements.txt pypi
  • librosa ==0.6.3
  • matplotlib ==2.2.3
  • numpy *
  • pandas *
  • pyaudio ==0.2.11
  • scikit-learn ==0.24.2
  • soundfile ==0.9.0
  • tensorflow ==2.5.2
  • tqdm ==4.28.1
  • wave *