asr-dysarthria

Research on Automatic Speech Recognition for dysarthric speech

https://github.com/jmaczan/asr-dysarthria

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, sciencedirect.com, ieee.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.4%) to scientific vocabulary

Keywords

asr automatic-speech-recognition deep-learning dysarthria dysarthric-speech self-supervised-learning wav2vec2
Last synced: 6 months ago · JSON representation ·

Repository

Research on Automatic Speech Recognition for dysarthric speech

Basic Info
Statistics
  • Stars: 11
  • Watchers: 2
  • Forks: 2
  • Open Issues: 4
  • Releases: 0
Topics
asr automatic-speech-recognition deep-learning dysarthria dysarthric-speech self-supervised-learning wav2vec2
Created about 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Citation

README.md

ASR Dysarthria

Automatic speech recognition for people with dysarthria

This repo is under heavy research and development and so the README.md is outdated. Sorry!

I deployed a web page so you can use a model in your browser: https://asr-dysarthria-preliminary.pages.dev/

Training

Use this Jupyter Notebook wav2vec2-large-xls-r-300m-dysarthria-big-dataset.ipynb to train your own model

Installation

Prerequisities:

  • Python >= 3.10
  • Anaconda

Steps:

  • conda install --file requirements.txt

Inference

In directory cli-app:

Run model.safetensors: python -m run

Run ONNX: python -m onnx_run

Adjust these scripts if needed (by default they translate a file.wav file in cli-app folder)

Deploying

Download and convert trained model (model.safetensors file)

sh mkdir models python scripts/convert_model.py --url https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria-big-dataset/resolve/main/model.safetensors --output models

Serve it

cd web-app python -m http.server

Pretrained models

  • [Recommended] Loss: 0.0864, Wer: 0.182 https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria-big-dataset
  • Loss: 0.0615 Wer: 0.1764 https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria

Datasets

  • Uaspeech https://huggingface.co/datasets/Vinotha/uaspeechall
  • TORGO https://huggingface.co/datasets/jmaczan/TORGO

Description

The code here is based on Patrick von Platen's article and notebook https://huggingface.co/blog/fine-tune-xlsr-wav2vec2

Resources

Papers

https://ar5iv.labs.arxiv.org/html/2204.00770 (https://arxiv.org/abs/2204.00770)

https://www.isca-speech.org/archive/pdfs/interspeech2022/baskar22binterspeech.pdf

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10225595

https://www.sciencedirect.com/science/article/pii/S2405959521000874

https://www.isca-speech.org/archive/pdfs/interspeech2021/green21interspeech.pdf

https://arxiv.org/pdf/2006.11477.pdf

https://arxiv.org/pdf/2211.00089.pdf

https://www.sciencedirect.com/science/article/abs/pii/S0957417423002981

Code

https://huggingface.co/blog/fine-tune-wav2vec2-english

Data

http://www.cs.toronto.edu/~complingweb/data/TORGO/torgo.html

Dataset

Big

https://huggingface.co/datasets/jmaczan/TORGO

Small

https://huggingface.co/datasets/jmaczan/TORGO-very-small

Others

https://ai.meta.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/

https://pytorch.org/audio/stable/tutorials/speechrecognitionpipeline_tutorial.html

https://huggingface.co/docs/datasets/v2.16.1/audio_dataset

https://distill.pub/2017/ctc/

https://ai.meta.com/blog/self-supervision-and-building-more-robust-speech-recognition-systems/

Cite

If you use this repository in your research, please use the following citation:

bibtex @misc{Maczan_ASR_Dysarthria_2024, title = "Research on Automatic Speech Recognition for dysarthric speech", author = "{Maczan, Jędrzej Paweł}", howpublished = "\url{https://github.com/jmaczan/asr-dysarthria}", year = 2024, publisher = {GitHub} }

License

MIT License

Author

Jędrzej Paweł Maczan

https://huggingface.co/jmaczan | jed@maczan.pl | https://github.com/jmaczan

Owner

  • Name: Jędrzej Maczan
  • Login: jmaczan
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software in your research, please cite it as below."
authors:
- family-names: "Maczan"
  given-names: "Jędrzej Paweł"
  orcid: "https://orcid.org/0000-0003-1741-6064"
title: "asr-dysarthria"
date-released: 2024-01-12
url: "https://github.com/jmaczan/asr-dysarthria"

GitHub Events

Total
  • Watch event: 6
  • Fork event: 2
Last Year
  • Watch event: 6
  • Fork event: 2

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 1
  • Total pull requests: 9
  • Average time to close issues: 21 days
  • Average time to close pull requests: 18 days
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 14.0
  • Average comments per pull request: 0.44
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 8
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: 21 days
  • Average time to close pull requests: less than a minute
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 14.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Polarnight77 (1)
Pull Request Authors
  • dependabot[bot] (15)
  • jmaczan (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (15)