whisper-small-indian-accent
A fine-tuned Whisper Small model for improved speech recognition of Indian English accents.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary
Repository
A fine-tuned Whisper Small model for improved speech recognition of Indian English accents.
Basic Info
- Host: GitHub
- Owner: omsusi
- License: other
- Default Branch: master
- Size: 582 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Whisper Small Model - Indian Accent This repository hosts a fine-tuned version of OpenAI's Whisper Small model, specifically adapted for improved speech recognition of Indian English accents. This model aims to provide higher accuracy for transcribing audio that contains the unique phonetic and linguistic characteristics of Indian English speech.
🚀 Model Details
Base Model: OpenAI Whisper Small
Fine-tuning Objective: Enhanced transcription accuracy for Indian English.
Model Weights Format: safetensors (for improved security and loading speed)
📊 Training Details
Due to resource constraints (the full Opus version being ~100GB and not feasible for free Google Colab training), we utilized a sampled subset of the NPTEL-2020 Indian English Speech Dataset.
Sample Dataset Download: https://github.com/AI4Bharat/NPTEL2020-Indian-English-Speech-Dataset/releases/download/v0.1/nptel-pure-set.tar.gz
Dataset Description: The NPTEL-2020 dataset comprises speech from lectures delivered by Indian professors, providing a valuable source of Indian English speech.
✨ Performance
Our custom-trained Whisper-small model significantly outperforms the pre-trained Whisper-small model on a custom validation dataset. The substantial reduction in both Word Error Rate (WER) and Character Error Rate (CER) demonstrates the effectiveness of domain-specific fine-tuning for Indian English accents.
Evaluation Results on Custom Validation Dataset
Model(s):
Pre-trained Whisper-small
WER (Word Error Rate) : 32.1
CER (Character Error Rate) : 12.3
Custom-trained Whisper-small
WER (Word Error Rate) : 15.6
CER (Character Error Rate) : 7.8
Custom Dataset Used for Testing
To validate the model's performance on realistic Indian English speech, we used a custom dataset comprising our own voices.
Custom Audio Data: https://drive.google.com/drive/folders/1bKVak_v3T-qtyEzdIY7AkDQ57ap18J2o?usp=sharing
Required JSON File for Testing: https://drive.google.com/file/d/1bX1sjeRVEVhqoWwfVYm-IEE7K-v-AevR/view?usp=sharing
Testing Colab Notebook
You can reproduce our testing and evaluate the model's performance yourself using the following Google Colab notebook:
Colab Notebook: https://colab.research.google.com/drive/1zcL9dbifU2ZenJIjYOeSVkOwheB_u281?usp=sharing
🛠️ Usage
This model can be easily loaded and used with the Hugging Face transformers library.
- Installation:
First, ensure you have the necessary libraries installed:
pip install transformers accelerate safetensors datasets soundfile # Or just pip install transformers[torch] accelerate
- Load the Model and Processor:
You can load the model directly from this GitHub repository (once uploaded) or from its corresponding Hugging Face Hub page.
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
import torch
import librosa # You might need to pip install librosa
model_id = "omsusi/whisper-small-indian-accent"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torchdtype = torch.float16 if torch.cuda.isavailable() else "cpu" # Changed float32 to cpu for consistency
Make sure your audio is sampled at 16kHz
sample_rate = 16000
duration = 5 # seconds
dummyaudio = torch.randn(1, samplerate * duration).numpy() # Example: 5 seconds of random noise
processor = AutoProcessor.frompretrained(modelid)
model = AutoModelForSpeechSeq2Seq.frompretrained( modelid, torchdtype=torchdtype, lowcpumemusage=True, usesafetensors=True ).to(device)
Process dummy audio
inputfeatures = processor( dummyaudio, samplingrate=samplerate, returntensors="pt" ).inputfeatures.to(device)
Generate transcription
predictedids = model.generate(inputfeatures)
transcription = processor.batchdecode(predictedids, skipspecialtokens=True)[0]
print(f"Transcription: {transcription}")
🙏 Attribution
If you use this model in your research, projects, or applications, please ensure you provide appropriate credit to the original creators, as required by the CC BY 4.0 license.
This includes:
Omsubhra Singha
Aman Kumar
You can provide credit by:
Linking to this GitHub repository.
Mentioning our names in your project's documentation, acknowledgements, or credits section.
Citing the model using the information provided in the CITATION.cff file (if applicable for academic use).
Your adherence to these attribution requirements, as per the CC BY 4.0 license, is greatly appreciated.
📜 License
This Whisper Small Model (Indian Accent) is distributed under the Creative Commons Attribution 4.0 International Public License (CC BY 4.0).
You are free to:
Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Under the following terms:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
For the full text of the license, please see the LICENSE file in this repository or visit: https://creativecommons.org/licenses/by/4.0/
📞 Contact
For questions or inquiries, please open an issue in this repository or contact https://www.linkedin.com/in/omsubhra-singha-30447a254/ and amnkmr2098@gmail.com.
Owner
- Login: omsusi
- Kind: user
- Repositories: 1
- Profile: https://github.com/omsusi
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this model, please cite it as below."
authors:
- family-names: "Singha"
given-names: "Omsubhra"
- family-names: "Kumar"
given-names: "Aman"
title: "Whisper Small Model (Indian Accent)"
version: "1.0.0"
date-released: "2025-06-16"
repository-code: "https://github.com/omsusi/whisper-small-indian-accent"
keywords:
- "whisper"
- "speech recognition"
- "indian accent"
- "fine-tuned model"
- "ai"
- "machine learning"
license: "CC-BY-4.0" # Official SPDX identifier for Creative Commons Attribution 4.0
GitHub Events
Total
- Push event: 4
- Create event: 1
Last Year
- Push event: 4
- Create event: 1