huggingsound

HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools

https://github.com/jonatasgrosman/huggingsound

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary

Keywords

asr audio automatic-speech-recognition speech speech-recognition speech-to-text transformers

Last synced: 6 months ago · JSON representation ·

Repository

HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools

Basic Info

Host: GitHub
Owner: jonatasgrosman
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 598 KB

Statistics

Stars: 462
Watchers: 16
Forks: 45
Open Issues: 39
Releases: 8

Topics

asr audio automatic-speech-recognition speech speech-recognition speech-to-text transformers

Created about 4 years ago · Last pushed over 2 years ago

Metadata Files

Readme Contributing Funding License Citation

HuggingSound

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools.

I have no intention of building a very complex tool here. I just wanna have an easy-to-use toolkit for my speech-related experiments. I hope this library could be helpful for someone else too :)

Requirements

Python 3.8+

Installation

console $ pip install huggingsound

How to use it?

I'll try to summarize the usage of this toolkit. But many things will be missing from the documentation below. I promise to make it better soon. For now, you can open an issue if you have some questions or look at the source code to see how it works. You can check more usage examples in the repository examples folder.

Speech recognition

For speech recognition you can use any CTC model hosted on the Hugging Face Hub. You can find some available models here.

Inference

```python from huggingsound import SpeechRecognitionModel

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-english") audio_paths = ["/path/to/sagan.mp3", "/path/to/asimov.wav"]

transcriptions = model.transcribe(audio_paths)

print(transcriptions)

transcriptions format (a list of dicts, one for each audio file):

[

{

"transcription": "extraordinary claims require extraordinary evidence",

"start_timestamps": [100, 120, 140, 180, ...],

"end_timestamps": [120, 140, 180, 200, ...],

"probabilities": [0.95, 0.88, 0.9, 0.97, ...]

},

...]

as you can see, not only the transcription is returned but also the timestamps (in milliseconds)

and probabilities of each character of the transcription.

```

Inference (boosted by a language model)

```python from huggingsound import SpeechRecognitionModel, KenshoLMDecoder

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-english") audio_paths = ["/path/to/sagan.mp3", "/path/to/asimov.wav"]

The LM format used by the LM decoders is the KenLM format (arpa or binary file).

You can download some LM files examples from here: https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english/tree/main/language_model

lmpath = "path/to/your/lmfiles/lm.binary" unigramspath = "path/to/your/lmfiles/unigrams.txt"

We implemented three different decoders for LM boosted decoding: KenshoLMDecoder, ParlanceLMDecoder, and FlashlightLMDecoder

On this example, we'll use the KenshoLMDecoder

To use this decoder you'll need to install the Kensho's ctcdecode first (https://github.com/kensho-technologies/pyctcdecode)

decoder = KenshoLMDecoder(model.tokenset, lmpath=lmpath, unigramspath=unigrams_path)

transcriptions = model.transcribe(audio_paths, decoder=decoder)

print(transcriptions)

```

Evaluation

```python from huggingsound import SpeechRecognitionModel

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-english")

references = [ {"path": "/path/to/sagan.mp3", "transcription": "extraordinary claims require extraordinary evidence"}, {"path": "/path/to/asimov.wav", "transcription": "violence is the last refuge of the incompetent"}, ]

evaluation = model.evaluate(references)

print(evaluation)

evaluation format: {"wer": 0.08, "cer": 0.02}

```

Fine-tuning

```python from huggingsound import TrainingArguments, ModelArguments, SpeechRecognitionModel, TokenSet

model = SpeechRecognitionModel("facebook/wav2vec2-large-xlsr-53") output_dir = "my/finetuned/model/output/dir"

first of all, you need to define your model's token set

however, the token set is only needed for non-finetuned models

if you pass a new token set for an already finetuned model, it'll be ignored during training

tokens = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "'"] token_set = TokenSet(tokens)

define your train/eval data

traindata = [ {"path": "/path/to/sagan.mp3", "transcription": "extraordinary claims require extraordinary evidence"}, {"path": "/path/to/asimov.wav", "transcription": "violence is the last refuge of the incompetent"}, ] evaldata = [ {"path": "/path/to/sagan2.mp3", "transcription": "absence of evidence is not evidence of absence"}, {"path": "/path/to/asimov2.wav", "transcription": "the true delight is in the finding out rather than in the knowing"}, ]

and finally, fine-tune your model

model.finetune( outputdir, traindata=traindata, evaldata=evaldata, # the evaldata is optional tokenset=tokenset, )

```

Troubleshooting

If you are having trouble when loading MP3 files: $ sudo apt-get install ffmpeg

Want to help?

See the contribution guidelines if you'd like to contribute to HuggingSound project.

You don't even need to know how to code to contribute to the project. Even the improvement of our documentation is an outstanding contribution.

If this project has been useful for you, please share it with your friends. This project could be helpful for them too.

If you like this project and want to motivate the maintainers, give us a :star:. This kind of recognition will make us very happy with the work that we've done with :heart:

You can also sponsor me :heart_eyes:

Citation

If you want to cite the tool you can use this:

bibtex @misc{grosman2022huggingsound, title={{HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools}}, author={Grosman, Jonatas}, howpublished={\url{https://github.com/jonatasgrosman/huggingsound}}, year={2022} }

Owner

Name: Jonatas Grosman
Login: jonatasgrosman
Kind: user
Location: Brazil
Company: Pontifical Catholic University of Rio de Janeiro

Website: jonatasgrosman.com
Twitter: jonatasgrosman
Repositories: 18
Profile: https://github.com/jonatasgrosman

PhD in Computer Science

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Grosman
  given-names: Jonatas
title: "HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools"
date-released: 2022
url: "https://github.com/jonatasgrosman/huggingsound"
preferred-citation:
  type: generic
  authors:
  - family-names: Grosman
    given-names: Jonatas
  title: "HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools"
  year: 2022
  url: "https://github.com/jonatasgrosman/huggingsound"

GitHub Events

Total

Issues event: 1
Watch event: 31
Issue comment event: 5
Pull request event: 1
Fork event: 4

Last Year

Issues event: 1
Watch event: 31
Issue comment event: 5
Pull request event: 1
Fork event: 4

Committers

Last synced: almost 3 years ago

All Time

Total Commits: 31
Total Committers: 3
Avg Commits per committer: 10.333
Development Distribution Score (DDS): 0.065

Top Committers

Name	Email	Commits
Jonatas Grosman	j**n@g**m	29
Nicolas Kaenzig	n**s@a**i	1
Ubuntu	n**g@g**m	1

Committer Domains (Top 20 + Academic)

aifund.ai: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 49
Total pull requests: 54
Average time to close issues: about 1 month
Average time to close pull requests: about 1 month
Total issue authors: 43
Total pull request authors: 3
Average comments per issue: 1.78
Average comments per pull request: 0.93
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 51

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

arikhalperin (3)
its-ogawa (2)
danijel3 (2)
AntonioBuccola (2)
FredHaa (1)
maloadjav (1)
rovr (1)
qinyuenlp (1)
nkaenzig-aifund (1)
egorsmkv (1)
utnasun (1)
Technerder (1)
ogarciasierra (1)
tiennguyen12g (1)
mbelcen (1)

Pull Request Authors

dependabot[bot] (51)
nkaenzig (2)
jcsilva (1)

Top Labels

Issue Labels

Pull Request Labels

dependencies (51)

Packages

Total packages: 2
Total downloads:
- pypi 374 last-month
Total docker downloads: 10

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 7
(may contain duplicates)
Total versions: 16
Total maintainers: 1

pypi.org: huggingsound

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools.

Homepage: https://github.com/jonatasgrosman/huggingsound
Documentation: https://github.com/jonatasgrosman/huggingsound
License: MIT
Latest release: 0.1.6
published over 3 years ago

Versions: 8
Dependent Packages: 0
Dependent Repositories: 7
Downloads: 374 Last month
Docker Downloads: 10

Rankings

Stargazers count: 3.3%

Docker downloads count: 4.3%

Dependent repos count: 5.6%

Average: 6.0%

Forks count: 6.2%

Downloads: 6.4%

Dependent packages count: 10.1%

Maintainers (1)

jonatasgrosman

Last synced: 6 months ago

proxy.golang.org: github.com/jonatasgrosman/huggingsound

Documentation: https://pkg.go.dev/github.com/jonatasgrosman/huggingsound#section-documentation
License: mit
Latest release: v0.1.6
published over 3 years ago

Versions: 8
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 6.5%

Average: 6.7%

Dependent repos count: 7.0%

Last synced: 6 months ago

Dependencies

.github/workflows/publish.yml actions

actions/cache v1 composite
actions/checkout v2 composite
actions/setup-python v1 composite

.github/workflows/test.yml actions

actions/cache v1 composite
actions/checkout v2 composite
actions/setup-python v1 composite
codecov/codecov-action v1 composite

poetry.lock pypi

accelerate 0.22.0
aiohttp 3.8.5
aiosignal 1.3.1
async-timeout 4.0.3
attrs 23.1.0
audioread 3.0.0
certifi 2023.7.22
cffi 1.15.1
charset-normalizer 3.2.0
click 8.1.7
cmake 3.27.2
colorama 0.4.6
coverage 7.3.0
datasets 2.14.4
decorator 5.1.1
dill 0.3.7
exceptiongroup 1.1.3
filelock 3.12.3
frozenlist 1.4.0
fsspec 2023.6.0
huggingface-hub 0.16.4
idna 3.4
importlib-metadata 6.8.0
iniconfig 2.0.0
jinja2 3.1.2
jiwer 3.0.2
joblib 1.3.2
lazy-loader 0.3
librosa 0.10.1
lit 16.0.6
llvmlite 0.40.1
markupsafe 2.1.3
mpmath 1.3.0
msgpack 1.0.5
multidict 6.0.4
multiprocess 0.70.15
networkx 3.1
numba 0.57.1
numpy 1.24.4
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
packaging 23.1
pandas 2.0.3
platformdirs 3.10.0
pluggy 1.3.0
pooch 1.7.0
psutil 5.9.5
pyarrow 13.0.0
pycparser 2.21
pytest 7.4.0
pytest-cov 4.1.0
pytest-randomly 3.15.0
python-dateutil 2.8.2
pytz 2023.3
pyyaml 6.0.1
rapidfuzz 2.13.7
regex 2023.8.8
requests 2.31.0
scikit-learn 1.3.0
scipy 1.9.3
setuptools 68.1.2
six 1.16.0
soundfile 0.12.1
soxr 0.3.6
sympy 1.12
threadpoolctl 3.2.0
tokenizers 0.13.3
tomli 2.0.1
torch 2.0.0
tqdm 4.66.1
transformers 4.29.2
triton 2.0.0
typing-extensions 4.7.1
tzdata 2023.3
urllib3 2.0.4
wheel 0.41.2
xxhash 3.3.0
yarl 1.9.2
zipp 3.16.2

pyproject.toml pypi

coverage ^7.3.0 develop
pytest ^7.4.0 develop
pytest-cov ^4.1.0 develop
pytest-randomly ^3.15.0 develop
accelerate ^0.22.0
datasets ^2.14.4
jiwer ^3.0.2
librosa ^0.10.1
python >=3.8.0,<4
torch >=2.0.0, !=2.0.1
transformers >=4.23.0,<=4.29.2

huggingsound

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

HuggingSound

Requirements

Installation

How to use it?

Speech recognition

Inference

transcriptions format (a list of dicts, one for each audio file):

[

{

"transcription": "extraordinary claims require extraordinary evidence",

"start_timestamps": [100, 120, 140, 180, ...],

"end_timestamps": [120, 140, 180, 200, ...],

"probabilities": [0.95, 0.88, 0.9, 0.97, ...]

},

...]

as you can see, not only the transcription is returned but also the timestamps (in milliseconds)

and probabilities of each character of the transcription.

Inference (boosted by a language model)

The LM format used by the LM decoders is the KenLM format (arpa or binary file).

You can download some LM files examples from here: https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english/tree/main/language_model

We implemented three different decoders for LM boosted decoding: KenshoLMDecoder, ParlanceLMDecoder, and FlashlightLMDecoder

On this example, we'll use the KenshoLMDecoder

To use this decoder you'll need to install the Kensho's ctcdecode first (https://github.com/kensho-technologies/pyctcdecode)

Evaluation

evaluation format: {"wer": 0.08, "cer": 0.02}

Fine-tuning

first of all, you need to define your model's token set

however, the token set is only needed for non-finetuned models

if you pass a new token set for an already finetuned model, it'll be ignored during training

define your train/eval data

and finally, fine-tune your model

Troubleshooting

Want to help?

Citation

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: huggingsound

Rankings

Maintainers (1)

proxy.golang.org: github.com/jonatasgrosman/huggingsound

Rankings

Dependencies