asr-dysarthria
Research on Automatic Speech Recognition for dysarthric speech
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, sciencedirect.com, ieee.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.4%) to scientific vocabulary
Keywords
Repository
Research on Automatic Speech Recognition for dysarthric speech
Basic Info
- Host: GitHub
- Owner: jmaczan
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria
- Size: 2.64 MB
Statistics
- Stars: 11
- Watchers: 2
- Forks: 2
- Open Issues: 4
- Releases: 0
Topics
Metadata Files
README.md
ASR Dysarthria
Automatic speech recognition for people with dysarthria
This repo is under heavy research and development and so the README.md is outdated. Sorry!
I deployed a web page so you can use a model in your browser: https://asr-dysarthria-preliminary.pages.dev/
Training
Use this Jupyter Notebook wav2vec2-large-xls-r-300m-dysarthria-big-dataset.ipynb to train your own model
Installation
Prerequisities:
- Python >= 3.10
- Anaconda
Steps:
conda install --file requirements.txt
Inference
In directory cli-app:
Run model.safetensors: python -m run
Run ONNX: python -m onnx_run
Adjust these scripts if needed (by default they translate a file.wav file in cli-app folder)
Deploying
Download and convert trained model (model.safetensors file)
sh
mkdir models
python scripts/convert_model.py --url https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria-big-dataset/resolve/main/model.safetensors --output models
Serve it
cd web-app
python -m http.server
Pretrained models
- [Recommended] Loss: 0.0864, Wer: 0.182 https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria-big-dataset
- Loss: 0.0615 Wer: 0.1764 https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria
Datasets
- Uaspeech https://huggingface.co/datasets/Vinotha/uaspeechall
- TORGO https://huggingface.co/datasets/jmaczan/TORGO
Description
The code here is based on Patrick von Platen's article and notebook https://huggingface.co/blog/fine-tune-xlsr-wav2vec2
Resources
Papers
https://ar5iv.labs.arxiv.org/html/2204.00770 (https://arxiv.org/abs/2204.00770)
https://www.isca-speech.org/archive/pdfs/interspeech2022/baskar22binterspeech.pdf
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10225595
https://www.sciencedirect.com/science/article/pii/S2405959521000874
https://www.isca-speech.org/archive/pdfs/interspeech2021/green21interspeech.pdf
https://arxiv.org/pdf/2006.11477.pdf
https://arxiv.org/pdf/2211.00089.pdf
https://www.sciencedirect.com/science/article/abs/pii/S0957417423002981
Code
https://huggingface.co/blog/fine-tune-wav2vec2-english
Data
http://www.cs.toronto.edu/~complingweb/data/TORGO/torgo.html
Dataset
Big
https://huggingface.co/datasets/jmaczan/TORGO
Small
https://huggingface.co/datasets/jmaczan/TORGO-very-small
Others
https://ai.meta.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/
https://pytorch.org/audio/stable/tutorials/speechrecognitionpipeline_tutorial.html
https://huggingface.co/docs/datasets/v2.16.1/audio_dataset
https://distill.pub/2017/ctc/
https://ai.meta.com/blog/self-supervision-and-building-more-robust-speech-recognition-systems/
Cite
If you use this repository in your research, please use the following citation:
bibtex
@misc{Maczan_ASR_Dysarthria_2024,
title = "Research on Automatic Speech Recognition for dysarthric speech",
author = "{Maczan, Jędrzej Paweł}",
howpublished = "\url{https://github.com/jmaczan/asr-dysarthria}",
year = 2024,
publisher = {GitHub}
}
License
MIT License
Author
Jędrzej Paweł Maczan
https://huggingface.co/jmaczan | jed@maczan.pl | https://github.com/jmaczan
Owner
- Name: Jędrzej Maczan
- Login: jmaczan
- Kind: user
- Website: https://maczan.pl
- Twitter: jedmaczan
- Repositories: 30
- Profile: https://github.com/jmaczan
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software in your research, please cite it as below." authors: - family-names: "Maczan" given-names: "Jędrzej Paweł" orcid: "https://orcid.org/0000-0003-1741-6064" title: "asr-dysarthria" date-released: 2024-01-12 url: "https://github.com/jmaczan/asr-dysarthria"
GitHub Events
Total
- Watch event: 6
- Fork event: 2
Last Year
- Watch event: 6
- Fork event: 2
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 1
- Total pull requests: 9
- Average time to close issues: 21 days
- Average time to close pull requests: 18 days
- Total issue authors: 1
- Total pull request authors: 2
- Average comments per issue: 14.0
- Average comments per pull request: 0.44
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 8
Past Year
- Issues: 1
- Pull requests: 1
- Average time to close issues: 21 days
- Average time to close pull requests: less than a minute
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 14.0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Polarnight77 (1)
Pull Request Authors
- dependabot[bot] (15)
- jmaczan (2)