https://github.com/agp-internship/conformer

Nemo Conformer model for Farsi

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.5%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Nemo Conformer model for Farsi

Basic Info

Host: GitHub
Owner: agp-internship
Language: Jupyter Notebook
Default Branch: main
Size: 3.44 MB

Statistics

Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Releases: 0

Created almost 4 years ago · Last pushed almost 4 years ago

https://github.com/agp-internship/Conformer/blob/main/

# Conformer
This work aims to create a pipline for training the Conformer model for Persian ASR by employing Nvidia Nemo tools. Nemo is an open-source conversational ai toolkit and many ASR models including the conformer are built into it. Nemo models are available as either pre-trained on certain languages or as models with no training.
In this project we use the CTC-Conformer model of this toolkit and train it on Farsi. the code can be slightly modified to use other models of ASR as well.

## Requirements and Starting Out

Run the first cell in the code to download and install the libraries used throughout the code.
The dataset used in the notebook for training models on Farsi is the Mozilla Common Voice dataset version 5.1. Run the second cell to download the dataset. Then run the third cell to convert mp3 files to wav and to create json manifest files for the training, validation and test data. The json files contain an address and transcription for each audio file.
The third cell uses a script named manifest_cv.py which normalizes the characters in the transcription of each audio file. This script can be modified to change the valid characters in the normalizations.

## Tokenizer
The tokenizer cell uses a script from the nemo library to create an spe unigram tokenizer. The vocab size of the tokenizer can be modified by assigning the desired value to the variable "VOCAB_SIZE".

## Parameters
The parameters cell sets the config of the ASR model. The default config is first loaded by using the "from_pretrained" function of Nemo. To change the ASR model from the default CTC_Conformer with small parameter size, change the argument of the "from_pretrained" function. The tokenizer and manifest file paths are then set in the config.
The model parameter sizes and training parameters such as batch-size can be set to a desired value next.

## References
[Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100)
[Nemo ASR](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/models.html)

Owner

Name: AGP Internship
Login: agp-internship
Kind: organization

Repositories: 3
Profile: https://github.com/agp-internship

ASR Gooyesh Pardaz Internship Projects Repository

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/agp-internship/conformer

Science Score: 10.0%

Repository

Basic Info

Statistics

https://github.com/agp-internship/Conformer/blob/main/

Owner

GitHub Events

Total

Last Year