https://github.com/agp-internship/conformer

Nemo Conformer model for Farsi

https://github.com/agp-internship/conformer

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Nemo Conformer model for Farsi

Basic Info
  • Host: GitHub
  • Owner: agp-internship
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 3.44 MB
Statistics
  • Stars: 0
  • Watchers: 3
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 4 years ago · Last pushed almost 4 years ago

https://github.com/agp-internship/Conformer/blob/main/

# Conformer
This work aims to create a pipline for training the Conformer model for Persian ASR by employing Nvidia Nemo tools. Nemo is an open-source conversational ai toolkit and many ASR models including the conformer are built into it. Nemo models are available as either pre-trained on certain languages or as models with no training. 
In this project we use the CTC-Conformer model of this toolkit and train it on Farsi. the code can be slightly modified to use other models of ASR as well. 



## Requirements and Starting Out

Run the first cell in the code to download and install the libraries used throughout the code.
The dataset used in the notebook for training models on Farsi is the Mozilla Common Voice dataset version 5.1. Run the second cell to download the dataset. Then run the third cell to convert mp3 files to wav and to create json manifest files for the training, validation and test data. The json files contain an address and transcription for each audio file. 
The third cell uses a script named manifest_cv.py which normalizes the characters in the transcription of each audio file. This script can be modified to change the valid characters in the normalizations.

## Tokenizer
The tokenizer cell uses a script from the nemo library to create an spe unigram tokenizer. The vocab size of the tokenizer can be modified by assigning the desired value to the variable "VOCAB_SIZE".

## Parameters
The parameters cell sets the config of the ASR model. The default config is first loaded by using the "from_pretrained" function of Nemo. To change the ASR model from the default CTC_Conformer with small parameter size, change the argument of the "from_pretrained" function. The tokenizer and manifest file paths are then set in the config. 
The model parameter sizes and training parameters such as batch-size can be set to a desired value next.   
  
## References
[Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100)  
[Nemo ASR](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/models.html)

Owner

  • Name: AGP Internship
  • Login: agp-internship
  • Kind: organization

ASR Gooyesh Pardaz Internship Projects Repository

GitHub Events

Total
Last Year