asr-wav2vec-finetune

⚡ Finetune Wa2vec 2.0 For Speech Recognition

https://github.com/mahshid1378/asr-wav2vec-finetune

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary

Keywords

asr finetuning huggingface-transformers pytorch speech-recognition speech-to-text vietnames-speech-recoging vietnamese wav2vec2

Last synced: 11 months ago · JSON representation ·

Repository

⚡ Finetune Wa2vec 2.0 For Speech Recognition

Basic Info

Host: GitHub
Owner: mahshid1378
Language: Python
Default Branch: main
Homepage:
Size: 5.01 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

asr finetuning huggingface-transformers pytorch speech-recognition speech-to-text vietnames-speech-recoging vietnamese wav2vec2

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme Citation

:zap: FINETUNE WAV2VEC 2.0 FOR SPEECH RECOGNITION

Documentation
Available Features
Installation
Train
Inference
Logs and Visualization
Citation
Vietnamese

Documentation

Suppose you need a simple way to fine-tune the Wav2vec 2.0 model for the task of Speech Recognition on your datasets, then you came to the right place.
All documents related to this repo can be found here: - Wav2vec2ForCTC - Tutorial

Available Features

[x] Multi-GPU training
[x] Automatic Mix Precision
[ ] Push to Huggingface Hub

Installation

pip install -r requirements.txt

Train

Prepare your dataset
- Your dataset can be in .txt or .csv format.
- path and transcript columns are compulsory. The path column contains the paths to your stored audio files, depending on your dataset location, it can be either absolute paths or relative paths. The transcript column contains the corresponding transcripts to the audio paths.
- Check out our example files for more information.
- Important: Ignoring these following notes is still OK but can hurt the performance.
  - Make sure that your transcript contains words only. Numbers should be converted into words and special characters such as r'[,?.!\-;:"“%\'�]' are removed by default, but you can change them in the base_dataset.py if your transcript is not clean enough.
  - If your transcript contains special tokens like bos_token, eos_token, unk_token (eg: <unk>, [unk],...) or pad_token (eg: <pad>, [pad],...)). Please specify it in the config.toml otherwise the Tokenizer can't recognize them.
Configure the config.toml file: Pay attention to the pretrained_path argument, it loads "facebook/wav2vec2-base" pre-trained model from Facebook by default. If you wish to pre-train wav2vec2 on your dataset, check out this REPO.
Run
- Start training from scratch: python train.py -c config.toml
- Resume: python train.py -c config.toml -r
- Load specific model and start training: python train.py -c config.toml -p path/to/your/model.tar

Inference

We provide an inference script that can transcribe a given audio file or even a list of audio files. Please take a look at the arguments below, especially the -f TEST_FILEPATH and the -s HUGGINGFACE_FOLDER arguments: ```cmd usage: inference.py [-h] -f TESTFILEPATH [-s HUGGINGFACEFOLDER] [-m MODELPATH] [-d DEVICEID]

ASR INFERENCE ARGS

optional arguments: -h, --help show this help message and exit -f TESTFILEPATH, --testfilepath TESTFILEPATH It can be either the path to your audio file (.wav, .mp3) or a text file (.txt) containing a list of audio file paths. -s HUGGINGFACEFOLDER, --huggingfacefolder HUGGINGFACEFOLDER The folder where you stored the huggingface files. Check the argument of [huggingface.args] in config.toml. Default value: "huggingface-hub". -m MODELPATH, --modelpath MODELPATH Path to the model (.tar file) in saved/<projectname>/checkpoints. If not provided, default uses the pytorchmodel.bin in the <HUGGINGFACEFOLDER> -d DEVICEID, --deviceid DEVICE_ID The device you want to test your model on if CUDA is available. Otherwise, CPU is used. Default value: 0 ```

Transcribe an audio file: ```cmd python inference.py \ -f path/to/your/audio/file.wav(.mp3) \ -s huggingface-hub

output example:

transcript: Hello World ```

Transcribe a list of audio files. Check the input file test.txt and the output file transcript_test.txt (which will be stored in the same folder as the input file): cmd python inference.py \ -f path/to/your/test.txt \ -s huggingface-hub

Logs and Visualization

The logs during the training will be stored, and you can visualize it using TensorBoard by running this command: ```

specify the in config.json

tensorboard --logdir ~/saved/

specify a port 8080

tensorboard --logdir ~/saved/ --port 8080 ```

Citation

text @software{Duy_Khanh_Finetune_Wav2vec_2_0_2022, author = {Duy Khanh, Le}, doi = {10.5281/zenodo.6540979}, month = {5}, title = {{Finetune Wav2vec 2.0 For Speech Recognition}}, url = {https://github.com/khanld/ASR-Wa2vec-Finetune}, year = {2022} }

Vietnamese

Please take a look here for Vietnamese people who want to train on public datasets like VIOS, COMMON VOICE, FOSD, and VLSP.

Owner

Name: mahshidmoolaei
Login: mahshid1378
Kind: user
Location: Iran

Repositories: 2
Profile: https://github.com/mahshid1378

Exercises I did inside the master

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
  Finetune Wav2vec 2.0 For Speech
  Recognition
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Le
    family-names: Khanh
    name-particle: Duy
    email: khanhld218@uef.edu.vn
identifiers:
  - type: doi
    value: 10.5281/zenodo.6540979
repository-code: 'https://github.com/khanld/ASR-Wa2vec-Finetune'
url: >-
  https://github.com/khanld/ASR-Wa2vec-Finetune
keywords:
  - asr
date-released: 2022-05-12
doi: 10.5281/zenodo.6540979

GitHub Events

Total

Push event: 14
Create event: 2

Last Year

Push event: 14
Create event: 2

Committers

Last synced: about 1 year ago

All Time

Total Commits: 15
Total Committers: 1
Avg Commits per committer: 15.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 15
Committers: 1
Avg Commits per committer: 15.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
mahshidmoulaei	m**i@y**m	15

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

asr-wav2vec-finetune

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

:zap: FINETUNE WAV2VEC 2.0 FOR SPEECH RECOGNITION

Table of contents

Documentation

Available Features

Installation

Train

Inference

output example:

Logs and Visualization

specify the in config.json

specify a port 8080

Citation

Vietnamese

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels