asr-dysarthria

Research on Automatic Speech Recognition for dysarthric speech

https://github.com/jmaczan/asr-dysarthria

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, sciencedirect.com, ieee.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.4%) to scientific vocabulary

Keywords

asr automatic-speech-recognition deep-learning dysarthria dysarthric-speech self-supervised-learning wav2vec2

Last synced: 10 months ago · JSON representation ·

Repository

Research on Automatic Speech Recognition for dysarthric speech

Basic Info

Host: GitHub
Owner: jmaczan
Language: Jupyter Notebook
Default Branch: main
Homepage: https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria
Size: 2.64 MB

Statistics

Stars: 11
Watchers: 2
Forks: 2
Open Issues: 4
Releases: 0

Topics

asr automatic-speech-recognition deep-learning dysarthria dysarthric-speech self-supervised-learning wav2vec2

Created over 2 years ago · Last pushed almost 2 years ago

Metadata Files

Readme Citation

ASR Dysarthria

Automatic speech recognition for people with dysarthria

This repo is under heavy research and development and so the README.md is outdated. Sorry!

I deployed a web page so you can use a model in your browser: https://asr-dysarthria-preliminary.pages.dev/

Training

Use this Jupyter Notebook wav2vec2-large-xls-r-300m-dysarthria-big-dataset.ipynb to train your own model

Installation

Prerequisities:

Python >= 3.10
Anaconda

Steps:

conda install --file requirements.txt

Inference

In directory cli-app:

Run model.safetensors: python -m run

Run ONNX: python -m onnx_run

Adjust these scripts if needed (by default they translate a file.wav file in cli-app folder)

Deploying

Download and convert trained model (model.safetensors file)

sh mkdir models python scripts/convert_model.py --url https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria-big-dataset/resolve/main/model.safetensors --output models

Serve it

cd web-app python -m http.server

Pretrained models

[Recommended] Loss: 0.0864, Wer: 0.182 https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria-big-dataset
Loss: 0.0615 Wer: 0.1764 https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria

Datasets

Uaspeech https://huggingface.co/datasets/Vinotha/uaspeechall
TORGO https://huggingface.co/datasets/jmaczan/TORGO

Description

The code here is based on Patrick von Platen's article and notebook https://huggingface.co/blog/fine-tune-xlsr-wav2vec2

Resources

Papers

https://ar5iv.labs.arxiv.org/html/2204.00770 (https://arxiv.org/abs/2204.00770)

https://www.isca-speech.org/archive/pdfs/interspeech2022/baskar22binterspeech.pdf

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10225595

https://www.sciencedirect.com/science/article/pii/S2405959521000874

https://www.isca-speech.org/archive/pdfs/interspeech2021/green21interspeech.pdf

https://arxiv.org/pdf/2006.11477.pdf

https://arxiv.org/pdf/2211.00089.pdf

https://www.sciencedirect.com/science/article/abs/pii/S0957417423002981

Code

https://huggingface.co/blog/fine-tune-wav2vec2-english

Data

http://www.cs.toronto.edu/~complingweb/data/TORGO/torgo.html

Dataset

Big

https://huggingface.co/datasets/jmaczan/TORGO

Small

https://huggingface.co/datasets/jmaczan/TORGO-very-small

Others

https://ai.meta.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/

https://pytorch.org/audio/stable/tutorials/speechrecognitionpipeline_tutorial.html

https://huggingface.co/docs/datasets/v2.16.1/audio_dataset

https://distill.pub/2017/ctc/

https://ai.meta.com/blog/self-supervision-and-building-more-robust-speech-recognition-systems/

Cite

If you use this repository in your research, please use the following citation:

bibtex @misc{Maczan_ASR_Dysarthria_2024, title = "Research on Automatic Speech Recognition for dysarthric speech", author = "{Maczan, Jędrzej Paweł}", howpublished = "\url{https://github.com/jmaczan/asr-dysarthria}", year = 2024, publisher = {GitHub} }

License

MIT License

Author

Jędrzej Paweł Maczan

https://huggingface.co/jmaczan | jed@maczan.pl | https://github.com/jmaczan

Owner

Name: Jędrzej Maczan
Login: jmaczan
Kind: user

Website: https://maczan.pl
Twitter: jedmaczan
Repositories: 30
Profile: https://github.com/jmaczan

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software in your research, please cite it as below."
authors:
- family-names: "Maczan"
  given-names: "Jędrzej Paweł"
  orcid: "https://orcid.org/0000-0003-1741-6064"
title: "asr-dysarthria"
date-released: 2024-01-12
url: "https://github.com/jmaczan/asr-dysarthria"

GitHub Events

Total

Watch event: 6
Fork event: 2

Last Year

Watch event: 6
Fork event: 2

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 1
Total pull requests: 9
Average time to close issues: 21 days
Average time to close pull requests: 18 days
Total issue authors: 1
Total pull request authors: 2
Average comments per issue: 14.0
Average comments per pull request: 0.44
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 8

Past Year

Issues: 1
Pull requests: 1
Average time to close issues: 21 days
Average time to close pull requests: less than a minute
Issue authors: 1
Pull request authors: 1
Average comments per issue: 14.0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0