zh_trainer
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: DuyTa506
- Language: Python
- Default Branch: main
- Size: 5.52 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
:zap: FINETUNE WAV2VEC 2.0 FOR PINYIN STYLE SPEECH RECOGNITION
Table of contents
Documentation
Suppose you need a simple way to fine-tune the Wav2vec 2.0 model for the task of Speech Recognition on your datasets, then you came to the right place. All documents related to this repo can be found here: - Wav2vec2ForCTC - Tutorial
Available Features
- [x] Multi-GPU training
- [x] Automatic Mix Precision
- [ ] Push to Huggingface Hub (Offline)
Installation
pip install -r requirements.txt
Train
- Prepare your dataset
- Prepare the folder data for both wav files and txt files pair with same name
- path and transcript columns are compulsory. The path column contains the paths to your stored audio files, depending on your dataset location, it can be either absolute paths or relative paths. The transcript column contains the corresponding transcripts to the audio paths.
- Check out our example folders for more information.
- Important: Ignoring these following notes is still OK but can hurt the performance.
- Special characters such as
r'[,?.!\-;:"“%\'�]'are removed by default, but you can change them in the base_dataset.py if your transcript is not clean enough. - If your transcript contains special tokens like
bos_token, eos_token, unk_token (eg: <unk>, [unk],...) or pad_token (eg: <pad>, [pad],...)). Please specify it in the config.toml otherwise the Tokenizer can't recognize them.
- Special characters such as
- Configure the config.toml file: Pay attention to the pretrained_path argument, it loads "facebook/wav2vec2-base" pre-trained model from Facebook by default. Change it to the pretrained models from phase 1 if need
- Run
- Prepare the datasets :
python create_data.py -c config.toml - Start training from scratch:
python train.py -c config.toml - Resume:
python train.py -c config.toml -r - Load specific model and start training:
python train.py -c config.toml -p path/to/your/model.tar
- Prepare the datasets :
Inference
We provide an inference script that can transcribe a given audio file or even a list of audio files. Please take a look at the arguments below, especially the -f TEST_FILEPATH and the -s HUGGINGFACE_FOLDER arguments:
```cmd
usage: inference.py [-h] -f TESTFILEPATH [-s HUGGINGFACEFOLDER] [-m MODELPATH] [-d DEVICEID]
ASR INFERENCE ARGS
optional arguments:
-h, --help show this help message and exit
-f TESTFILEPATH, --testfilepath TESTFILEPATH
It can be either the path to your audio file (.wav, .mp3) or a text file (.txt) containing a list of audio file paths.
-s HUGGINGFACEFOLDER, --huggingfacefolder HUGGINGFACEFOLDER
The folder where you stored the huggingface files. Check the
Transcribe an audio file: ```cmd python inference.py \ -f path/to/your/audio/file.wav(.mp3) \ -s huggingface-hub
output example:
transcript: Hello World ```
Transcribe a list of audio files. Check the input file test.txt and the output file transcript_test.txt (which will be stored in the same folder as the input file):
cmd
python inference.py \
-f path/to/your/test.txt \
-s huggingface-hub
Logs and Visualization
The logs during the training will be stored, and you can visualize it using TensorBoard by running this command: ```
specify the in config.json
tensorboard --logdir ~/saved/
specify a port 8080
tensorboard --logdir ~/saved/

Owner
- Login: DuyTa506
- Kind: user
- Repositories: 1
- Profile: https://github.com/DuyTa506
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
Finetune Wav2vec 2.0 For Speech
Recognition
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Ta
family-names: Khanh
name-particle: Duy
email: duyfaker01@gmail.com
repository-code: 'https://github.com/DuyTa506/Wav2Vec2.0_From_Scratch'
url: >-
https://github.com/DuyTa506/Wav2Vec2.0_From_Scratch
keywords:
- asr
date-released: 2024-09-01
GitHub Events
Total
Last Year
Dependencies
- accelerate *
- dask *
- datasets *
- huggingface_hub *
- librosa *
- numpy *
- pandarallel *
- pandas *
- scikit_learn *
- soundfile *
- tensorflow *
- toml *
- torch *
- torchaudio *
- tqdm *
- transformers *