https://github.com/ai4bharat/sruti

Benchmarks, model checkpoints, and supplementary resources for our Interspeech 2025 paper “Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women.”

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Benchmarks, model checkpoints, and supplementary resources for our Interspeech 2025 paper “Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women.”

Basic Info

Host: GitHub
Owner: AI4Bharat
Default Branch: master
Homepage:
Size: 364 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed 10 months ago

Metadata Files

Readme

Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women

This repository contains the resources, dataset information, and code for the Interspeech 2025 paper: "Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women." Our work focuses on creating the SRUTI benchmark for rural Bhojpuri women and leveraging synthetic speech to improve ASR performance for this underserved demographic.

Paper Abstract: Digital inclusion remains a challenge for marginalized communities, especially rural women in low-resource language regions like Bhojpuri. Voice-based access to agricultural services, financial transactions, government schemes, and healthcare is vital for their empowerment, yet existing ASR systems for this group remain largely untested. To address this gap, we create SRUTI, a benchmark consisting of rural Bhojpuri women speakers. Evaluation of current ASR models on SRUTI shows poor performance due to data scarcity, which is difficult to overcome due to social and cultural barriers that hinder large-scale data collection. To overcome this, we propose generating synthetic speech using just 2530 seconds of audio per speaker from approximately 100 rural women. Augmenting existing datasets with this synthetic data achieves an improvement of 4.7 WER, providing a scalable, minimally intrusive solution to enhance ASR and promote digital inclusion in low-resource language.

Resources

This section provides an overview of and links to the key resources developed and used in this work.

Datasets & Text Corpora

SRUTI Benchmark Dataset: For evaluating ASR for rural Bhojpuri women.
- Link: View SRUTI Benchmark *
- Objective: Real-world speech benchmark for rural Bhojpuri women, covering key domains.. 72 minutes, 444 utterances, 51 speakers, 4 target domains
- Language: Bhojpuri (dialect accents from Bhadohi, Jaunpur, Mirzapur districts, UP, India).
- Domains: Agriculture, Health, Government Schemes, Finance, and ice-breaker topics.
- Demographics: Women, 4 age groups (18-60+), varied education.
Seed Audio for Synthesis: 39.4 mins real speech (100 Bhojpuri women, 100 Hindi women).
- Link: View Real Train Data
- Objective: 25-30s speech/100 unique speakers (total 39.4 mins) transcribed.
Text Prompts for SRUTI Data Collection:
- Access Link
Text Corpora for Synthetic Data Generation:
- Bhojpuri: Access Link

Tools Used

Data Collection and Verification: Kathbath app (Open Source).
Data Transcription: Shoonya app (Open Source).
Speech Synthesis: Multilingual prompt-based model 11. IndicF5.
ASR Architecture: Conformer-L [17] with Hybrid CTC + RNN-T loss) IndicConformer.

Model Checkpoints & Code

Code Repository: Training, evaluation, and synthetic data generation code.
- Training/Evaluation:
- Synthetic Data:
Pre-trained Model Checkpoints (M1-M4):

| Model ID | Description | Datasets Used | Model Checkpoint Link | | :------- | :------------------------------------------------ | :------------------------------------------------------------------------------------------------------------ | :---------------------------- | | M1 | Monolingual Bhojpuri | Real Bhojpuri (133.4 hrs): SpeeS-IA [19], ULCA NewsOnAir [6], Vaani [1], LIMMITS [20] | M1 | | M2 | Bilingual Bhojpuri + Hindi | M1 Data + Real Hindi (376 hrs): IndicVoices [2] | M2 | | M3 | Bilingual + Synthetic Bhojpuri | M2 Data + Synthetic Bhojpuri (100 hrs) (Seed: 39.4 mins Bhojpuri women) | M3 | | M4 | Bilingual + Synthetic Bhojpuri + Synthetic Hindi | M3 Data + Synthetic Hindi (100 hrs) (Seed: ~40 mins Hindi women) | M4 |

Data Collection Methodology

Community Engagement: Collaboration with ASHA/ANMs for trust and informed consent.
- Link to Brochure used for engagement: Link
On-Field Collection: Using Kathbath app in government facilities (Primary Health centres (PHCs)).

Verification and Transcription

Ensured data quality through rigorous verification and transcription.

Verification

Verified by in-house experts for clarity and relevance, even with background noise. * Link to Verification Guidelines Document: [Link]

Transcription

Link to Transcription Guidelines Document: [Link]

Model Building

Developed and evaluated ASR models using the SRUTI benchmark and synthetic data.

Models Trained

(Refer to the table in " Resources > checkpoint_models Model Checkpoints & Code" for dataset details.) * M1: Monolingual Bhojpuri (133.4 hrs real Bhojpuri). * M2: M1 + Real Hindi (376 hrs). * M3: M2 + Synthetic Bhojpuri (100 hrs). * M4: M3 + Synthetic Hindi (100 hrs).

Training and Evaluation Scripts [Coming Soon]

Scripts and configurations for model training and SRUTI benchmark evaluation.

Training & Evaluation Scripts:.
Environment Setup:

Running Experiments

Citation

If you use the SRUTI dataset, code, or findings from our paper in your research, please cite our work:

```bibtex @misc{joshi2025recognizingvoiceinclusiveasr, title={Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women}, author={Sakshi Joshi and Eldho Ittan George and Tahir Javed and Kaushal Bhogale and Nikhil Narasimhan and Mitesh M. Khapra}, year={2025}, eprint={2506.09653}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2506.09653}, }

Owner

Name: AI4Bhārat
Login: AI4Bharat
Kind: organization
Email: opensource@ai4bharat.org
Location: India

Website: https://ai4bharat.org
Twitter: AI4Bharat
Repositories: 37
Profile: https://github.com/AI4Bharat

Artificial-Intelligence-For-Bhārat : Building open-source AI solutions for India!

GitHub Events

Total

Delete event: 1
Push event: 6
Pull request event: 1
Create event: 1

Last Year

Delete event: 1
Push event: 6
Pull request event: 1
Create event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 0
Total pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

https://github.com/ai4bharat/sruti

Science Score: 36.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women

Resources

Datasets & Text Corpora

Tools Used

Model Checkpoints & Code

Data Collection Methodology

Verification and Transcription

Verification

Transcription

Model Building

Models Trained

Training and Evaluation Scripts [Coming Soon]

Running Experiments

Citation

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels