whisper-small-indian-accent

A fine-tuned Whisper Small model for improved speech recognition of Indian English accents.

https://github.com/omsusi/whisper-small-indian-accent

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.8%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

A fine-tuned Whisper Small model for improved speech recognition of Indian English accents.

Basic Info
  • Host: GitHub
  • Owner: omsusi
  • License: other
  • Default Branch: master
  • Size: 582 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created 12 months ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

Whisper Small Model - Indian Accent This repository hosts a fine-tuned version of OpenAI's Whisper Small model, specifically adapted for improved speech recognition of Indian English accents. This model aims to provide higher accuracy for transcribing audio that contains the unique phonetic and linguistic characteristics of Indian English speech.

🚀 Model Details

Base Model: OpenAI Whisper Small

Fine-tuning Objective: Enhanced transcription accuracy for Indian English.

Model Weights Format: safetensors (for improved security and loading speed)

📊 Training Details

Due to resource constraints (the full Opus version being ~100GB and not feasible for free Google Colab training), we utilized a sampled subset of the NPTEL-2020 Indian English Speech Dataset.

Sample Dataset Download: https://github.com/AI4Bharat/NPTEL2020-Indian-English-Speech-Dataset/releases/download/v0.1/nptel-pure-set.tar.gz

Dataset Description: The NPTEL-2020 dataset comprises speech from lectures delivered by Indian professors, providing a valuable source of Indian English speech.

✨ Performance

Our custom-trained Whisper-small model significantly outperforms the pre-trained Whisper-small model on a custom validation dataset. The substantial reduction in both Word Error Rate (WER) and Character Error Rate (CER) demonstrates the effectiveness of domain-specific fine-tuning for Indian English accents.

Evaluation Results on Custom Validation Dataset

Model(s):

Pre-trained Whisper-small

WER (Word Error Rate) : 32.1

CER (Character Error Rate) : 12.3

Custom-trained Whisper-small

WER (Word Error Rate) : 15.6

CER (Character Error Rate) : 7.8

Custom Dataset Used for Testing

To validate the model's performance on realistic Indian English speech, we used a custom dataset comprising our own voices.

Custom Audio Data: https://drive.google.com/drive/folders/1bKVak_v3T-qtyEzdIY7AkDQ57ap18J2o?usp=sharing

Required JSON File for Testing: https://drive.google.com/file/d/1bX1sjeRVEVhqoWwfVYm-IEE7K-v-AevR/view?usp=sharing

Testing Colab Notebook

You can reproduce our testing and evaluate the model's performance yourself using the following Google Colab notebook:

Colab Notebook: https://colab.research.google.com/drive/1zcL9dbifU2ZenJIjYOeSVkOwheB_u281?usp=sharing

🛠️ Usage

This model can be easily loaded and used with the Hugging Face transformers library.

  1. Installation:

First, ensure you have the necessary libraries installed:

pip install transformers accelerate safetensors datasets soundfile # Or just pip install transformers[torch] accelerate

  1. Load the Model and Processor:

You can load the model directly from this GitHub repository (once uploaded) or from its corresponding Hugging Face Hub page.

from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor

import torch

import librosa # You might need to pip install librosa

model_id = "omsusi/whisper-small-indian-accent"

device = "cuda:0" if torch.cuda.is_available() else "cpu"

torchdtype = torch.float16 if torch.cuda.isavailable() else "cpu" # Changed float32 to cpu for consistency

Make sure your audio is sampled at 16kHz

sample_rate = 16000

duration = 5 # seconds

dummyaudio = torch.randn(1, samplerate * duration).numpy() # Example: 5 seconds of random noise

processor = AutoProcessor.frompretrained(modelid)

model = AutoModelForSpeechSeq2Seq.frompretrained( modelid, torchdtype=torchdtype, lowcpumemusage=True, usesafetensors=True ).to(device)

Process dummy audio

inputfeatures = processor( dummyaudio, samplingrate=samplerate, returntensors="pt" ).inputfeatures.to(device)

Generate transcription

predictedids = model.generate(inputfeatures)

transcription = processor.batchdecode(predictedids, skipspecialtokens=True)[0]

print(f"Transcription: {transcription}")

🙏 Attribution

If you use this model in your research, projects, or applications, please ensure you provide appropriate credit to the original creators, as required by the CC BY 4.0 license.

This includes:

Omsubhra Singha

Aman Kumar

You can provide credit by:

Linking to this GitHub repository.

Mentioning our names in your project's documentation, acknowledgements, or credits section.

Citing the model using the information provided in the CITATION.cff file (if applicable for academic use).

Your adherence to these attribution requirements, as per the CC BY 4.0 license, is greatly appreciated.

📜 License

This Whisper Small Model (Indian Accent) is distributed under the Creative Commons Attribution 4.0 International Public License (CC BY 4.0).

You are free to:

Share — copy and redistribute the material in any medium or format for any purpose, even commercially.

Adapt — remix, transform, and build upon the material for any purpose, even commercially.

Under the following terms:

Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

For the full text of the license, please see the LICENSE file in this repository or visit: https://creativecommons.org/licenses/by/4.0/

📞 Contact

For questions or inquiries, please open an issue in this repository or contact https://www.linkedin.com/in/omsubhra-singha-30447a254/ and amnkmr2098@gmail.com.

Owner

  • Login: omsusi
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this model, please cite it as below."
authors:
  - family-names: "Singha"
    given-names: "Omsubhra"
  - family-names: "Kumar"
    given-names: "Aman" 
title: "Whisper Small Model (Indian Accent)"
version: "1.0.0"
date-released: "2025-06-16" 
repository-code: "https://github.com/omsusi/whisper-small-indian-accent" 
keywords:
  - "whisper"
  - "speech recognition"
  - "indian accent"
  - "fine-tuned model"
  - "ai"
  - "machine learning"
license: "CC-BY-4.0" # Official SPDX identifier for Creative Commons Attribution 4.0

GitHub Events

Total
  • Push event: 4
  • Create event: 1
Last Year
  • Push event: 4
  • Create event: 1