https://github.com/ai4bharat/sruti

Benchmarks, model checkpoints, and supplementary resources for our Interspeech 2025 paper “Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women.”

https://github.com/ai4bharat/sruti

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Benchmarks, model checkpoints, and supplementary resources for our Interspeech 2025 paper “Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women.”

Basic Info
  • Host: GitHub
  • Owner: AI4Bharat
  • Default Branch: master
  • Homepage:
  • Size: 364 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 9 months ago · Last pushed 6 months ago
Metadata Files
Readme

README.md

Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women

Hugging Face Datasets arXiv

This repository contains the resources, dataset information, and code for the Interspeech 2025 paper: "Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women." Our work focuses on creating the SRUTI benchmark for rural Bhojpuri women and leveraging synthetic speech to improve ASR performance for this underserved demographic.

Paper Abstract: Digital inclusion remains a challenge for marginalized communities, especially rural women in low-resource language regions like Bhojpuri. Voice-based access to agricultural services, financial transactions, government schemes, and healthcare is vital for their empowerment, yet existing ASR systems for this group remain largely untested. To address this gap, we create SRUTI, a benchmark consisting of rural Bhojpuri women speakers. Evaluation of current ASR models on SRUTI shows poor performance due to data scarcity, which is difficult to overcome due to social and cultural barriers that hinder large-scale data collection. To overcome this, we propose generating synthetic speech using just 2530 seconds of audio per speaker from approximately 100 rural women. Augmenting existing datasets with this synthetic data achieves an improvement of 4.7 WER, providing a scalable, minimally intrusive solution to enhance ASR and promote digital inclusion in low-resource language.


Resources

This section provides an overview of and links to the key resources developed and used in this work.

Datasets & Text Corpora

  • SRUTI Benchmark Dataset: For evaluating ASR for rural Bhojpuri women.
    • Link: View SRUTI Benchmark *
    • Objective: Real-world speech benchmark for rural Bhojpuri women, covering key domains.. 72 minutes, 444 utterances, 51 speakers, 4 target domains
    • Language: Bhojpuri (dialect accents from Bhadohi, Jaunpur, Mirzapur districts, UP, India).
    • Domains: Agriculture, Health, Government Schemes, Finance, and ice-breaker topics.
    • Demographics: Women, 4 age groups (18-60+), varied education.
  • Seed Audio for Synthesis: 39.4 mins real speech (100 Bhojpuri women, 100 Hindi women).

    • Link: View Real Train Data
    • Objective: 25-30s speech/100 unique speakers (total 39.4 mins) transcribed.
  • Text Prompts for SRUTI Data Collection:

  • Text Corpora for Synthetic Data Generation:

Tools Used

  • Data Collection and Verification: Kathbath app (Open Source).
  • Data Transcription: Shoonya app (Open Source).

  • Speech Synthesis: Multilingual prompt-based model 11. IndicF5.

  • ASR Architecture: Conformer-L [17] with Hybrid CTC + RNN-T loss) IndicConformer.

Model Checkpoints & Code

  • Code Repository: Training, evaluation, and synthetic data generation code.
    • Training/Evaluation:
    • Synthetic Data:
  • Pre-trained Model Checkpoints (M1-M4):

    | Model ID | Description | Datasets Used | Model Checkpoint Link | | :------- | :------------------------------------------------ | :------------------------------------------------------------------------------------------------------------ | :---------------------------- | | M1 | Monolingual Bhojpuri | Real Bhojpuri (133.4 hrs): SpeeS-IA [19], ULCA NewsOnAir [6], Vaani [1], LIMMITS [20] | M1 | | M2 | Bilingual Bhojpuri + Hindi | M1 Data + Real Hindi (376 hrs): IndicVoices [2] | M2 | | M3 | Bilingual + Synthetic Bhojpuri | M2 Data + Synthetic Bhojpuri (100 hrs) (Seed: 39.4 mins Bhojpuri women) | M3 | | M4 | Bilingual + Synthetic Bhojpuri + Synthetic Hindi | M3 Data + Synthetic Hindi (100 hrs) (Seed: ~40 mins Hindi women) | M4 |


Data Collection Methodology

  1. Community Engagement: Collaboration with ASHA/ANMs for trust and informed consent.
    • Link to Brochure used for engagement: Link
  2. On-Field Collection: Using Kathbath app in government facilities (Primary Health centres (PHCs)).

Verification and Transcription

Ensured data quality through rigorous verification and transcription.

Verification

Verified by in-house experts for clarity and relevance, even with background noise. * Link to Verification Guidelines Document: [Link]

Transcription

  • Link to Transcription Guidelines Document: [Link]

Model Building

Developed and evaluated ASR models using the SRUTI benchmark and synthetic data.

Models Trained

(Refer to the table in " Resources > checkpoint_models Model Checkpoints & Code" for dataset details.) * M1: Monolingual Bhojpuri (133.4 hrs real Bhojpuri). * M2: M1 + Real Hindi (376 hrs). * M3: M2 + Synthetic Bhojpuri (100 hrs). * M4: M3 + Synthetic Hindi (100 hrs).


Training and Evaluation Scripts [Coming Soon]

Scripts and configurations for model training and SRUTI benchmark evaluation.

  • Training & Evaluation Scripts:.
  • Environment Setup:

Running Experiments


Citation

If you use the SRUTI dataset, code, or findings from our paper in your research, please cite our work:

```bibtex @misc{joshi2025recognizingvoiceinclusiveasr, title={Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women}, author={Sakshi Joshi and Eldho Ittan George and Tahir Javed and Kaushal Bhogale and Nikhil Narasimhan and Mitesh M. Khapra}, year={2025}, eprint={2506.09653}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2506.09653}, }

Owner

  • Name: AI4Bhārat
  • Login: AI4Bharat
  • Kind: organization
  • Email: opensource@ai4bharat.org
  • Location: India

Artificial-Intelligence-For-Bhārat : Building open-source AI solutions for India!

GitHub Events

Total
  • Delete event: 1
  • Push event: 6
  • Pull request event: 1
  • Create event: 1
Last Year
  • Delete event: 1
  • Push event: 6
  • Pull request event: 1
  • Create event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • sakshi-joshi-30 (2)
Top Labels
Issue Labels
Pull Request Labels