voice-based-emotion-recognition-using-ai-and-speech-processing
A deep learning project that detects human emotions from audio using MFCC feature extraction and a neural network classifier. Built using Python and TensorFlow, this tool leverages speech signal processing to predict emotions like happy, angry, or sad, helping machines better understand human sentiment.
https://github.com/captaincodercool/voice-based-emotion-recognition-using-ai-and-speech-processing
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary
Repository
A deep learning project that detects human emotions from audio using MFCC feature extraction and a neural network classifier. Built using Python and TensorFlow, this tool leverages speech signal processing to predict emotions like happy, angry, or sad, helping machines better understand human sentiment.
Basic Info
- Host: GitHub
- Owner: CAPTAINCODERCOOL
- License: apache-2.0
- Language: Python
- Default Branch: master
- Size: 12.4 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
🎙️ Voice-Based Emotion Recognition Using AI
This project focuses on recognizing human emotions from speech using deep learning techniques and signal processing. It combines MFCC-based feature extraction with a powerful neural network model to classify emotions like happy, sad, angry, and neutral from voice recordings.
🚀 Features
- 🎧 Detects emotion from voice recordings
- 🔍 MFCC feature extraction for audio signal processing
- 🧠 Deep neural network for multi-class emotion classification
- 🗂 Real-time audio classification & dataset preprocessing
- 📊 Accuracy visualization using confusion matrices
🛠 Tech Stack
- Python 3
- TensorFlow / Keras
- NumPy, Pandas
- librosa
- scikit-learn
- Matplotlib, Seaborn
📦 Installation
- Clone this repository: ```bash git clone https://github.com/CAPTAINCODERCOOL/emotion-recognition-from-speech.git cd emotion-recognition-from-speech Install the required dependencies:
bash Copy Edit pip install -r requirements.txt (Optional) Use your own audio dataset, or rely on the pre-organized folders.
🎯 How It Works Loads WAV audio files from datasets
Extracts MFCCs (Mel-frequency cepstral coefficients)
Trains a neural network to classify emotions into:
Happy 😊
Angry 😠
Sad 😢
Fear 😨
Neutral 😐
📂 Project Structure bash Copy Edit emotion-recognition/ ├── audio/ # Sample audio files for testing ├── datasets/ # Training datasets (e.g., RAVDESS, SAVEE) ├── extract_features.py # MFCC + label extraction ├── model.py # Training and prediction model ├── predict.py # Predict emotion from new audio ├── requirements.txt └── README.md 🔬 Example Usage bash Copy Edit python model.py python predict.py --input audio/sample.wav 📊 Evaluation Confusion Matrix
Accuracy & loss plots
Precision, recall, F1-score
Test on unseen audio files
Future Enhancements Add real-time microphone input
Expand to multilingual datasets
Convert into a Streamlit Web App
Deploy via Flask for real-world applications
Dataset Link https://drive.google.com/drive/folders/1fTN2VkAcTycXCCXmQ71k5ioQmxTMoLm7?usp=sharing
📜 License This repository follows the original MIT License.
🌐 Connect With Me GitHub: CAPTAINCODERCOOL
LinkedIn: chiragpatil04
Email: chiragpatilprofessional@gmail.com
Owner
- Login: CAPTAINCODERCOOL
- Kind: user
- Repositories: 1
- Profile: https://github.com/CAPTAINCODERCOOL
Citation (CITATION.cff)
cff-version: 1.2.0
message: If you use this software, please cite it as below.
authors:
- family-names: Abdeladim
given-names: Fadheli
title: Speech Emotion Recognition
version: 1.0.0
date-released: 2019-04-28
abstract: "This repository presents a comprehensive SER framework that employs various machine learning and deep learning techniques to accurately detect and classify human emotions from speech. The framework utilizes four datasets, including RAVDESS, TESS, EMO-DB, and a custom dataset, comprising a diverse range of emotions such as neutral, calm, happy, sad, angry, fear, disgust, pleasant surprise, and boredom. Feature extraction is performed using widely adopted audio features, including MFCC, Chromagram, MEL Spectrogram Frequency, Contrast, and Tonnetz. The repository also supports grid search for hyperparameter tuning and offers a range of classifiers and regressors such as SVC, RandomForest, GradientBoosting, KNeighbors, MLP, Bagging, and Recurrent Neural Networks. The developed SER system demonstrates promising accuracy in emotion classification, making it a valuable tool for researchers and practitioners in the field of affective computing and related domains."
repository-code: https://github.com/x4nth055/emotion-recognition-using-speech
license: MIT
GitHub Events
Total
- Push event: 3
- Create event: 3
Last Year
- Push event: 3
- Create event: 3
Dependencies
- librosa ==0.6.3
- matplotlib ==2.2.3
- numpy *
- pandas *
- pyaudio ==0.2.11
- scikit-learn ==0.24.2
- soundfile ==0.9.0
- tensorflow ==2.5.2
- tqdm ==4.28.1
- wave *