multilingual-asr

Multilingual Speech Recognition for Indonesian Languages

https://github.com/indonesian-nlp/multilingual-asr

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.6%) to scientific vocabulary

Keywords

asr indonesian machine-learning nlp speech-recognition

Last synced: 11 months ago · JSON representation ·

Repository

Multilingual Speech Recognition for Indonesian Languages

Basic Info

Host: GitHub
Owner: indonesian-nlp
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 411 KB

Statistics

Stars: 62
Watchers: 3
Forks: 5
Open Issues: 0
Releases: 0

Topics

asr indonesian machine-learning nlp speech-recognition

Created over 4 years ago · Last pushed almost 4 years ago

Metadata Files

Readme License Citation

Multilingual Speech Recognition for Indonesian Languages

Speech Recognition Live Demo

Introduction

Automatic Speech Recognition (ASR) enables the recognition and translation of spoken language into text. Typically the ASR Model is trained and used for a specific language. However, Indonesia has more than 700 spoken languages. It is not practicable to provide a speech recognition model for each language.

Therefore, we want to develop a multilingual speech recognition model that can at least support some of the main Indonesian languages without sacrificing model performance for each language.

Objectives

We want to develop and build a multilingual speech recognition model with the Indonesian, Javanese, and Sundanese datasets. The model should perform well in all these three languages. We also train monolingual models for comparison purposes.

Methods

We used the following speech datasets for the training/finetuning: - Indonesian Common Voice - High-quality TTS data for Javanese - High-quality TTS data for Sundanese

We used Wav2vec 2.0, a framework for self-supervised learning of speech representations which is now state of the art on the Librispeech benchmark for noisy speech, for Indonesia, Javanese and Sundanese language.

We trained a multilingual Wav2vec 2.0 model with the three languages combined for 200 epochs. We also trained three Wav2vec 2.0 models with a single language for Indonesian, Java, and Sundanese, each for 200 epochs.

Results and Comparison

We built a multilingual Speech Recognition model and publish it as open source model. We also provide a live demo to test the model.

Following is the comparison of the models and the list of its performance evaluation:

The Models Comparison

The following figure is the model comparison by Word Error Rate (WER) for the Test split of Indonesian Common Voice 6.1 (less is better)

Without Language Model

ASR-Comparison

With Language Model

Lastly, we integrated a language model into our speech recognition pipeline, which reduces the WER from 11.57% to 4.27% on the Test split of Indonesian Common Voice 6.1. We also evaluated the performance of Google Speech To Text, its WER for the Test split of Indonesian Common Voice 6.1 is 9.22%.

ASR-Comparison

The detail of the performance evaluation

The performance evaluation can be found here

Conclusion

The experiment shows that the multilingual model can perform on par with a model trained on a single language; the Word Error Rate (WER) difference is maximal 0.6 absolute percent. We also trained the multilingual model with more epochs, and it outperforms the monolingual model.
The monolingual model performs very well in the language we trained for but poorly in other languages.
The multilingual speech recognition model overcomes the need to have a separate model for each language in Indonesia. Therefore, it significantly reduces hardware resources and simplifies the model deployment.

Future Works

We plan following for the future: - Training the model with more data and more Indonesian languages. - ~~Integrating Language Model to reduce the WER~~ - Compressing the model size for speeding up the inferencing time and reducing hardware resources - Developing real-time speech recognition based on this multilingual model.

Owner

Name: indonesian-nlp
Login: indonesian-nlp
Kind: organization

Repositories: 6
Profile: https://github.com/indonesian-nlp

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Wirawan
    given-names: Cahya
    orcid: https://orcid.org/0000-0002-0263-8273
title: "Multilingual Speech Recognition for Indonesian Languages"
version: 1.0.0
date-released: 2021-10-29
url: "https://github.com/indonesian-nlp/multilingual-asr"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

multilingual-asr

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Multilingual Speech Recognition for Indonesian Languages

Introduction

Objectives

Methods

Results and Comparison

The Models Comparison

Without Language Model

With Language Model

The detail of the performance evaluation

Conclusion

Future Works

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year