speechbrain

A PyTorch-based Speech Toolkit

https://github.com/speechbrain/speechbrain

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, sciencedirect.com, ieee.org
✓
Committers with academic emails
16 of 243 committers (6.6%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.2%) to scientific vocabulary

Keywords

asr audio audio-processing deep-learning huggingface language-model pytorch speaker-diarization speaker-recognition speaker-verification speech-enhancement speech-processing speech-recognition speech-separation speech-to-text speech-toolkit speechrecognition spoken-language-understanding transformers voice-recognition

Keywords from Contributors

transformer cryptocurrency jax cryptography voice-conversion text-to-speech speech-translation speech-synthesis singing-voice-synthesis kaldi

Last synced: 6 months ago · JSON representation ·

Repository

A PyTorch-based Speech Toolkit

Basic Info

Host: GitHub
Owner: speechbrain
License: apache-2.0
Language: Python
Default Branch: develop
Homepage: http://speechbrain.github.io
Size: 98.3 MB

Statistics

Stars: 10,272
Watchers: 133
Forks: 1,539
Open Issues: 174
Releases: 14

Topics

Created almost 6 years ago · Last pushed 6 months ago

Metadata Files

Readme Contributing License Citation Security

README.md

Please, help our community project. Star on GitHub!

Exciting News (January, 2024): Discover what is new in SpeechBrain 1.0 here!

🗣️💬 What SpeechBrain Offers

SpeechBrain is an open-source PyTorch toolkit that accelerates Conversational AI development, i.e., the technology behind speech assistants, chatbots, and large language models.
It is crafted for fast and easy creation of advanced technologies for Speech and Text Processing.

🌐 Vision

With the rise of deep learning, once-distant domains like speech processing and NLP are now very close. A well-designed neural network and large datasets are all you need.
We think it is now time for a holistic toolkit that, mimicking the human brain, jointly supports diverse technologies for complex Conversational AI systems.
This spans speech recognition, speaker recognition, speech enhancement, speech separation, language modeling, dialogue, and beyond.
Aligned with our long-term goal of natural human-machine conversation, including for non-verbal individuals, we have recently added support for the EEG modality.

📚 Training Recipes

We share over 200 competitive training recipes on more than 40 datasets supporting 20 speech and text processing tasks (see below).
We support both training from scratch and fine-tuning pretrained models such as Whisper, Wav2Vec2, WavLM, Hubert, GPT2, Llama2, and beyond. The models on HuggingFace can be easily plugged in and fine-tuned.
For any task, you train the model using these commands: python python train.py hparams/train.yaml
The hyperparameters are encapsulated in a YAML file, while the training process is orchestrated through a Python script.
We maintained a consistent code structure across different tasks.
For better replicability, training logs and checkpoints are hosted on Dropbox.

Pretrained Models and Inference

Access over 100 pretrained models hosted on HuggingFace.
Each model comes with a user-friendly interface for seamless inference. For example, transcribing speech using a pretrained model requires just three lines of code:

```python from speechbrain.inference import EncoderDecoderASR

asrmodel = EncoderDecoderASR.fromhparams(source="speechbrain/asr-conformer-transformerlm-librispeech", savedir="pretrainedmodels/asr-transformer-transformerlm-librispeech") asrmodel.transcribe_file("speechbrain/asr-conformer-transformerlm-librispeech/example.wav") ```

Documentation

We are deeply dedicated to promoting inclusivity and education.
We have authored over 30 tutorials that not only describe how SpeechBrain works but also help users familiarize themselves with Conversational AI.
Every class or function has clear explanations and examples that you can run. Check out the documentation for more details 📚.

🎯 Use Cases

🚀 Research Acceleration: Speeding up academic and industrial research. You can develop and integrate new models effortlessly, comparing their performance against our baselines.
⚡️ Rapid Prototyping: Ideal for quick prototyping in time-sensitive projects.
🎓 Educational Tool: SpeechBrain's simplicity makes it a valuable educational resource. It is used by institutions like Mila, Concordia University, Avignon University, and many others for student training.

🚀 Quick Start

To get started with SpeechBrain, follow these simple steps:

🛠️ Installation

Install via PyPI

Install SpeechBrain using PyPI:

bash pip install speechbrain
Access SpeechBrain in your Python code:

python import speechbrain as sb

Install from GitHub

This installation is recommended for users who wish to conduct experiments and customize the toolkit according to their needs.

Clone the GitHub repository and install the requirements:

bash git clone https://github.com/speechbrain/speechbrain.git cd speechbrain pip install -r requirements.txt pip install --editable .
Access SpeechBrain in your Python code:

python import speechbrain as sb

Any modifications made to the speechbrain package will be automatically reflected, thanks to the --editable flag.

✔️ Test Installation

Ensure your installation is correct by running the following commands:

bash pytest tests pytest --doctest-modules speechbrain

🏃‍♂️ Running an Experiment

In SpeechBrain, you can train a model for any task using the following steps:

python cd recipes/<dataset>/<task>/ python experiment.py params.yaml

The results will be saved in the output_folder specified in the YAML file.

📘 Learning SpeechBrain

Website: Explore general information on the official website.
Tutorials: Start with basic tutorials covering fundamental functionalities. Find advanced tutorials and topics in the Tutorial notebooks category in the SpeechBrain documentation.
Documentation: Detailed information on the SpeechBrain API, contribution guidelines, and code is available in the documentation.

🔧 Supported Technologies

SpeechBrain is a versatile framework designed for implementing a wide range of technologies within the field of Conversational AI.
It excels not only in individual task implementations but also in combining various technologies into complex pipelines.

🎙️ Speech/Audio Processing

| Tasks | Datasets | Technologies/Models | | ------------- |-------------| -----| | Speech Recognition | AISHELL-1, CommonVoice, DVoice, KsponSpeech, LibriSpeech, MEDIA, RescueSpeech, Switchboard, TIMIT, Tedlium2, Voicebank | CTC, Transducers, Transformers, Seq2Seq, Beamsearch techniques for CTC,seq2seq,transducers), Rescoring, Conformer, Branchformer, Hyperconformer, Kaldi2-FST | | Speaker Recognition | VoxCeleb | ECAPA-TDNN, ResNET, Xvectors, PLDA, Score Normalization | | Speech Separation | WSJ0Mix, LibriMix, WHAM!, WHAMR!, Aishell1Mix, BinauralWSJ0Mix | SepFormer, RESepFormer, SkiM, DualPath RNN, ConvTasNET | | Speech Enhancement | DNS, Voicebank | SepFormer, MetricGAN, MetricGAN-U, SEGAN, spectral masking, time masking | | Interpretability | ESC50 | Listenable Maps for Audio Classifiers (L-MAC), Learning-to-Interpret (L2I), Non-Negative Matrix Factorization (NMF), PIQ | | Speech Generation | AudioMNIST | Diffusion, Latent Diffusion | | Text-to-Speech | LJSpeech, LibriTTS | Tacotron2, Zero-Shot Multi-Speaker Tacotron2, FastSpeech2 | | Vocoding | LJSpeech, LibriTTS | HiFiGAN, DiffWave | Spoken Language Understanding | MEDIA, SLURP, Fluent Speech Commands, Timers-and-Such | Direct SLU, Decoupled SLU, Multistage SLU | | Speech-to-Speech Translation | CVSS | Discrete Hubert, HiFiGAN, wav2vec2 | | Speech Translation | Fisher CallHome (Spanish), IWSLT22(lowresource) | wav2vec2 | | Emotion Classification | IEMOCAP, ZaionEmotionDataset | ECAPA-TDNN, wav2vec2, Emotion Diarization | | Language Identification | VoxLingua107, CommonLanguage| ECAPA-TDNN | | Voice Activity Detection | LibriParty | CRDNN | | Sound Classification | ESC50, UrbanSound | CNN14, ECAPA-TDNN | | Self-Supervised Learning | CommonVoice, LibriSpeech | wav2vec2 | | Metric Learning | REAL-M, Voicebank | Blind SNR-Estimation, PESQ Learning | | Alignment | TIMIT | CTC, Viterbi, Forward Forward | | Diarization | AMI | ECAPA-TDNN, X-vectors, Spectral Clustering |

📝 Text Processing

| Tasks | Datasets | Technologies/Models | | ------------- |-------------| -----| | Language Modeling | CommonVoice, LibriSpeech| n-grams, RNNLM, TransformerLM | | Response Generation | MultiWOZ| GPT2, Llama2 | | Grapheme-to-Phoneme | LibriSpeech | RNN, Transformer, Curriculum Learning, Homograph loss |

🧠 EEG Processing

| Tasks | Datasets | Technologies/Models | | ------------- |-------------| -----| | Motor Imagery | BNCI2014001, BNCI2014004, BNCI2015001, Lee2019_MI, Zhou201 | EEGNet, ShallowConvNet, EEGConformer | | P300 | BNCI2014009, EPFLP300, bi2015a, | EEGNet | | SSVEP | Lee2019_SSVEP | EEGNet |

🔍 Additional Features

SpeechBrain includes a range of native functionalities that enhance the development of Conversational AI technologies. Here are some examples:

Training Orchestration: The Brain class serves as a fully customizable tool for managing training and evaluation loops over data. It simplifies training loops while providing the flexibility to override any part of the process.
Hyperparameter Management: A YAML-based hyperparameter file specifies all hyperparameters, from individual numbers (e.g., learning rate) to complete objects (e.g., custom models). This elegant solution drastically simplifies the training script.
Dynamic Dataloader: Enables flexible and efficient data reading.
GPU Training: Supports single and multi-GPU training, including distributed training.
Dynamic Batching: On-the-fly dynamic batching enhances the efficient processing of variable-length signals.
Mixed-Precision Training: Accelerates training through mixed-precision techniques.
Efficient Data Reading: Reads large datasets efficiently from a shared Network File System (NFS) via WebDataset.
Hugging Face Integration: Interfaces seamlessly with HuggingFace for popular models such as wav2vec2 and Hubert.
Orion Integration: Interfaces with Orion for hyperparameter tuning.
Speech Augmentation Techniques: Includes SpecAugment, Noise, Reverberation, and more.
Data Preparation Scripts: Includes scripts for preparing data for supported datasets.

SpeechBrain is rapidly evolving, with ongoing efforts to support a growing array of technologies in the future.

📊 Performance

SpeechBrain integrates a variety of technologies, including those that achieves competitive or state-of-the-art performance.
For a comprehensive overview of the achieved performance across different tasks, datasets, and technologies, please visit here.

📜 License

SpeechBrain is released under the Apache License, version 2.0, a popular BSD-like license.
You are free to redistribute SpeechBrain for both free and commercial purposes, with the condition of retaining license headers. Unlike the GPL, the Apache License is not viral, meaning you are not obligated to release modifications to the source code.

🔮Future Plans

We have ambitious plans for the future, with a focus on the following priorities:

Scale Up: We aim to provide comprehensive recipes and technologies for training massive models on extensive datasets.
Scale Down: While scaling up delivers unprecedented performance, we recognize the challenges of deploying large models in production scenarios. We are focusing on real-time, streamable, and small-footprint Conversational AI.
Multimodal Large Language Models: We envision a future where a single foundation model can handle a wide range of text, speech, and audio tasks. Our core team is focused on enabling the training of advanced multimodal LLMs.

🤝 Contributing

SpeechBrain is a community-driven project, led by a core team with the support of numerous international collaborators.
We welcome contributions and ideas from the community. For more information, check here.

🙏 Sponsors

SpeechBrain is an academically driven project and relies on the passion and enthusiasm of its contributors.
As we cannot rely on the resources of a large company, we deeply appreciate any form of support, including donations or collaboration with the core team.
If you're interested in sponsoring SpeechBrain, please reach out to us at speechbrainproject@gmail.com.
A heartfelt thank you to all our sponsors, including the current ones:

📖 Citing SpeechBrain

If you use SpeechBrain in your research or business, please cite it using the following BibTeX entry:

``bibtex @article{speechbrain_v1, author = {Mirco Ravanelli and Titouan Parcollet and Adel Moumen and Sylvain de Langen and Cem Subakan and Peter Plantinga and Yingzhi Wang and Pooneh Mousavi and Luca Della Libera and Artem Ploujnikov and Francesco Paissan and Davide Borra and Salah Zaiem and Zeyu Zhao and Shucong Zhang and Georgios Karakasidis and Sung-Lin Yeh and Pierre Champion and Aku Rouhe and Rudolf Braun and Florian Mai and Juan Zuluaga-Gomez and Seyed Mahed Mousavi and Andreas Nautsch and Ha Nguyen and Xuechen Liu and Sangeet Sagar and Jarod Duret and Salima Mdhaffar and Ga{{\"e}}lle Laperri{{\e}}re and Mickael Rouvier and Renato De Mori and Yannick Est{{`e}}ve}, title = {Open-Source Conversational AI with SpeechBrain 1.0}, journal = {Journal of Machine Learning Research}, year = {2024}, volume = {25}, number = {333}, url = {http://jmlr.org/papers/v25/24-0991.html} }

@misc{speechbrain, title={{SpeechBrain}: A General-Purpose Speech Toolkit}, author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio}, year={2021}, eprint={2106.04624}, archivePrefix={arXiv}, primaryClass={eess.AS}, note={arXiv:2106.04624} } ```

Owner

Name: SpeechBrain
Login: speechbrain
Kind: organization

Website: https://speechbrain.github.io/
Twitter: SpeechBrain1
Repositories: 3
Profile: https://github.com/speechbrain

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: SpeechBrain
message: A PyTorch-based Speech Toolkit
type: software
authors:
  - given-names: Mirco
    family-names: Ravanelli
    affiliation: 'Mila - Quebec AI Institute, Université de Montréal'
  - given-names: Titouan
    family-names: Parcollet
    affiliation: >-
      LIA - Avignon Université, CaMLSys - University of
      Cambridge
  - given-names: Peter
    family-names: Plantinga
    affiliation: Ohio State University
  - given-names: Aku
    family-names: Rouhe
    affiliation: Aalto University
  - given-names: Samuele
    family-names: Cornell
    affiliation: Università Politecnica delle Marche
  - given-names: Loren
    family-names: Lugosch
    affiliation: 'Mila - Quebec AI Institute, McGill University'
  - given-names: Cem
    family-names: Subakan
    affiliation: Mila - Quebec AI Institute
  - given-names: Nauman
    family-names: Dawalatabad
    affiliation: Indian Institute of Technology Madras
  - given-names: Abdelwahab
    family-names: Heba
    affiliation: IRIT - Université Paul Sabatier
  - given-names: Jianyuan
    family-names: Zhong
    affiliation: Mila - Quebec AI Institute
  - given-names: Ju-Chieh
    family-names: Chou
    affiliation: Toyota Technological Institute at Chicago
  - given-names: Sung-Lin
    family-names: Yeh
    affiliation: University of Edinburgh
  - given-names: Szu-Wei
    family-names: Fu
    affiliation: 'Academia Sinica, Taiwan'
  - given-names: Chien-Feng
    family-names: Liao
    affiliation: 'Academia Sinica, Taiwan'
  - given-names: Elena
    family-names: Rastorgueva
    affiliation: NVIDIA
  - given-names: François
    family-names: Grondin
    affiliation: Université de Sherbrooke
  - given-names: William
    family-names: Aris
    affiliation: Université de Sherbrooke
  - given-names: Hwidong
    family-names: Na
    affiliation: Samsung-SAIT
  - given-names: Yan
    family-names: Gao
    affiliation: CaMLSys - University of Cambridge
  - given-names: Renato
    name-particle: De
    family-names: Mori
    affiliation: 'LIA - Avignon Université, McGill University'
  - given-names: Yoshua
    family-names: Bengio
    affiliation: 'Mila - Quebec AI Institute, Université de Montréal'
identifiers:
  - type: doi
    value: 10.48550/arXiv.2106.04624
    description: 'SpeechBrain: A General-Purpose Speech Toolkit'
repository-code: 'https://github.com/speechbrain/speechbrain/'
url: 'https://speechbrain.github.io/'
abstract: >-
  SpeechBrain is an open-source and all-in-one speech
  toolkit. It is designed to facilitate the research and
  development of neural speech processing technologies by
  being simple, flexible, user-friendly, and
  well-documented. This paper describes the core
  architecture designed to support several tasks of common
  interest, allowing users to naturally conceive, compare
  and share novel speech processing pipelines. SpeechBrain
  achieves competitive or state-of-the-art performance in a
  wide range of speech benchmarks. It also provides training
  recipes, pretrained models, and inference scripts for
  popular speech datasets, as well as tutorials which allow
  anyone with basic Python proficiency to familiarize
  themselves with speech technologies.
keywords:
  - speech toolkit
  - audio
  - deep learning
  - PyTorch
  - transformers
  - voice recognition
  - speech recognition
  - speech-to-text
  - language model
  - speaker recognition
  - speaker verification
  - speech processing
  - audio processing
  - ASR
  - speaker diarization
  - speech separation
  - speech enhancement
  - spoken language understanding
  - HuggingFace
license: Apache-2.0

GitHub Events

Total

Create event: 12
Release event: 3
Issues event: 103
Watch event: 1,387
Delete event: 9
Issue comment event: 410
Push event: 105
Pull request review comment event: 468
Pull request review event: 406
Pull request event: 192
Fork event: 173

Last Year

Create event: 12
Release event: 3
Issues event: 103
Watch event: 1,387
Delete event: 9
Issue comment event: 410
Push event: 105
Pull request review comment event: 468
Pull request review event: 406
Pull request event: 192
Fork event: 173

Committers

Last synced: 9 months ago

All Time

Total Commits: 8,753
Total Committers: 243
Avg Commits per committer: 36.021
Development Distribution Score (DDS): 0.88

Past Year

Commits: 535
Committers: 31
Avg Commits per committer: 17.258
Development Distribution Score (DDS): 0.606

Top Committers

Name	Email	Commits
Titouan	p**n@g**m	1,052
Mirco Ravanelli	m**i@g**m	919
Peter Plantinga	p**r@p**m	807
Adel Moumen	a**o@g**m	445
Aku Rouhe	a**e@a**i	398
Gaëlle Laperrière	8****e	373
cem	c**n@g**m	364
Nauman Dawalatabad	n**w@g**m	353
flexthink	f****k	325
asu	s**g@s**r	321
popcornell	c**e@g**m	320
mirco	m**i@g**m	251
BenoitWang	w**6@g**m	219
30stomercury	f**w@g**m	161
anautsch	2****h	150
Abdelwahab Heba	a**a@i**r	143
Jianyuan Zhong	j**9@u**u	140
Pooneh	m**h@g**m	130
aheba	a**a@l**m	129
Jerry Chou	j**2@g**m	117
Jim Hays	j**s@j**m	106
prometheus	4****b	101
Loren Lugosch	l**h@g**m	95
fpaissan	me@f****t	76
williamaris	w**s@u**a	68
Sangeet Sagar	1**3@l**n	55
elenaras	e**s@y**k	41
pradnya-git-dev	k**p@g**m	41
JasonSWFu	w**9@g**m	41
jerrygood0703	j**3@g**m	40
and 213 more...

Committer Domains (Top 20 + Academic)

server.mila.quebec: 4 163.com: 3 mila.quebec: 3 cedar1.cedar.computecanada.ca: 3 qq.com: 3 samsung.com: 2 ugent.be: 2 mila02.server.mila.quebec: 2 cedar5.cedar.computecanada.ca: 2 apollor06.server.mila.quebec: 2 beluga5.int.ets1.calculquebec.ca: 2 usherbrooke.ca: 2 alumni.univ-avignon.fr: 2 beluga2.int.ets1.calculquebec.ca: 1 eos14.server.mila.quebec: 1 login02.cm.cluster: 1 agh.edu.pl: 1 iki.fi: 1 idiap.ch: 1 lalilo.com: 1 tum.de: 1 u.rochester.edu: 1 lnmiit.ac.in: 1 orthanc.ece.mcgill.ca: 1 ed.ac.uk: 1 bessemer-node030.shef.ac.uk: 1 csie.ntu.edu.tw: 1 zju.edu.cn: 1 bessemer-node001.shef.ac.uk: 1 nii.ac.jp: 1 serv-9205.kl.dfki.de: 1 u-mc016.univ-avignon.fr: 1 serv-3329.kl.dfki.de: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 175
Total pull requests: 371
Average time to close issues: 3 months
Average time to close pull requests: 2 months
Total issue authors: 90
Total pull request authors: 70
Average comments per issue: 2.15
Average comments per pull request: 2.61
Merged pull requests: 242
Bot issues: 0
Bot pull requests: 2

Past Year

Issues: 84
Pull requests: 194
Average time to close issues: 23 days
Average time to close pull requests: 14 days
Issue authors: 45
Pull request authors: 34
Average comments per issue: 1.57
Average comments per pull request: 2.02
Merged pull requests: 131
Bot issues: 0
Bot pull requests: 1

View more stats

Top Authors

Issue Authors

pplantinga (36)
asumagic (18)
TParcollet (15)
mravanelli (14)
jjery2243542 (6)
lucadellalib (5)
Adel-Moumen (5)
Craya (5)
cyberso (2)
kevin00616 (2)
Gastron (2)
underdogliu (2)
tomaz-suller (2)
egaznep (2)
GasserElbanna (2)

Pull Request Authors

pplantinga (82)
asumagic (69)
Adel-Moumen (66)
rogiervd (51)
TParcollet (34)
mravanelli (20)
poonehmousavi (14)
gfdb (13)
flexthink (12)
lucadellalib (10)
Chaanks (9)
shucongzhang (8)
Gastron (7)
BenoitWang (4)
matthewkperez (4)

Top Labels

Issue Labels

bug (167) enhancement (27) important (10) documentation (9) confirmed (8) refactor (8) correctness (4) regression (4) stale (4) performance (3) good first issue (1) recipes (1) question (1) invalid (1) help wanted (1) meta (1) ready to review (1)

Pull Request Labels

enhancement (96) bug (41) ready to review (33) recipes (15) refactor (14) work in progress (13) documentation (7) correctness (6) on hold (6) important (4) dependencies (4) performance (2) help wanted (2) python (1) regression (1)

Packages

Total packages: 1
Total downloads: unknown

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 12

proxy.golang.org: github.com/speechbrain/speechbrain

Documentation: https://pkg.go.dev/github.com/speechbrain/speechbrain#section-documentation
License: apache-2.0
Latest release: v1.0.3
published 11 months ago

Versions: 12
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 6.5%

Average: 6.7%

Dependent repos count: 7.0%

Last synced: 6 months ago

Dependencies

docs/docs-requirements.txt pypi

Sphinx >=3.4.3
better-apidoc >=0.3.1
ctc-segmentation >=1.7.0
fairseq *
numba >=0.54.1
recommonmark >=0.7.1
six *
sklearn *
sphinx-rtd-theme >=0.4.3
transformers ==4.13

lint-requirements.txt pypi

black ==19.10b0
click ==8.0.4
flake8 ==3.7.9
pycodestyle ==2.5.0
pytest ==5.4.1
yamllint ==1.23.0

recipes/AMI/Diarization/extra_requirements.txt pypi

sklearn *

recipes/Fisher-Callhome-Spanish/extra_requirements.txt pypi

sacrebleu *
sacremoses *

recipes/LibriSpeech/ASR/CTC/extra_requirements.txt pypi

transformers *

recipes/LibriSpeech/G2P/extra_requirements.txt pypi

datasets *

recipes/LibriSpeech/LM/extra_requirements.txt pypi

datasets ==1.6.2

recipes/TIMIT/ASR/transducer/extra_requirements.txt pypi

numba *
transformers ==4.4.0

recipes/Voicebank/MTL/ASR_enhance/extra_requirements.txt pypi

librosa *
pesq *
pystoi *

recipes/Voicebank/enhance/MetricGAN/extra_requirements.txt pypi

pesq *

recipes/timers-and-such/extra_requirements.txt pypi

inflect *
pandas *

requirements.txt pypi

SoundFile *
huggingface_hub >=0.7.0
hyperpyyaml >=0.0.1
joblib >=0.14.1
numpy >=1.17.0
packaging *
pre-commit >=2.3.0
scipy >=1.4.1
sentencepiece >=0.1.91
torch >=1.9.0
torchaudio >=0.9.0
tqdm >=4.42.0

setup.py pypi

huggingface_hub *
hyperpyyaml *
joblib *
numpy *
packaging *
scipy *
sentencepiece *
torch >=1.9
torchaudio *
tqdm *

templates/speech_recognition/LM/extra_requirements.txt pypi

datasets *

.github/workflows/newtag.yml actions

marvinpinto/action-automatic-releases v1.2.0 composite

.github/workflows/pre-commit.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite
pre-commit/action v2.0.3 composite

.github/workflows/pythonapp.yml actions

actions/checkout v2 composite
actions/setup-python v1 composite

.github/workflows/release.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite
pypa/gh-action-pypi-publish master composite

.github/workflows/verify-docs-gen.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

tests/samples/lang-shards/meta.json cpan

pyproject.toml pypi

recipes/Aishell1Mix/extra_requirements.txt pypi

matplotlib >=3.1.3
mir-eval ==0.6
pyloudnorm *
pyloudnorm >=0.1.0
pysndfx >=0.3.6

recipes/BinauralWSJ0Mix/extra_requirements.txt pypi

gitpython ==3.1.35
mir-eval ==0.6
pyroomacoustics >=0.7.3

recipes/DNS/enhancement/extra_requirements.txt pypi

librosa *
mir_eval *
onnxruntime *
pesq *
pyroomacoustics ==0.3.1
pystoi *
tensorboard *
webdataset *

recipes/ESC50/classification/extra_requirements.txt pypi

matplotlib *
scikit-learn *

recipes/ESC50/interpret/extra_requirements.txt pypi

matplotlib *
scikit-learn *

recipes/LJSpeech/TTS/extra_requirements.txt pypi

tensorboard *
tgt *
torchvision *
unidecode *

recipes/LibriMix/extra_requirements.txt pypi

mir-eval ==0.6
pyloudnorm *

recipes/LibriSpeech/ASR/transducer/extra_requirements.txt pypi

numba *

recipes/LibriTTS/vocoder/hifigan/extra_requirements.txt pypi

tensorboard *
torchvision *

recipes/REAL-M/sisnr-estimation/extra_requirements.txt pypi

pyroomacoustics *

recipes/RescueSpeech/extra_requirements.txt pypi

mir_eval *
pesq *
pystoi *

recipes/SLURP/extra_requirements.txt pypi

jsonlines *

recipes/UrbanSound8k/SoundClassification/extra_requirements.txt pypi

matplotlib *
tensorboard *

recipes/Voicebank/dereverb/MetricGAN-U/extra_requirements.txt pypi

recipes/Voicebank/enhance/MetricGAN-U/extra_requirements.txt pypi

recipes/VoxLingua107/lang_id/extra_requirements.txt pypi

webdataset *

recipes/WHAMandWHAMR/extra_requirements.txt pypi

mir-eval ==0.6
pyroomacoustics >=0.7.3

recipes/WSJ0Mix/extra_requirements.txt pypi

mir-eval ==0.6

recipes/ZaionEmotionDataset/emotion_diarization/extra_requirements.txt pypi

pathlib >=1.0.1
pydub >=0.25.1
webrtcvad >=2.0.10

recipes/CVSS/S2ST/extra_requirements.txt pypi

sacrebleu *

recipes/IWSLT22_lowresource/AST/transformer/extra_requirements.txt pypi

protobuf *
sacremoses *

recipes/LibriSpeech/ASR/transformer/extra_requirements.txt pypi

bayestorch >=0.0.3

docs/readthedocs-requirements.txt pypi

k2 ==1.24.4.dev20240223
torch ==2.2.1

recipes/GigaSpeech/ASR/CTC/extra_requirements.txt pypi

datasets *
kenlm *
soundfile *
speechcolab *
transformers *

recipes/GigaSpeech/ASR/transducer/extra_requirements.txt pypi

datasets *
numba *
soundfile *
speechcolab *

recipes/LibriTTS/vocoder/hifigan_discrete/extra_requirements.txt pypi

scikit-learn *

recipes/LJSpeech/quantization/extra_requirements.txt pypi

scikit-learn *
tgt *
unidecode *

recipes/LargeScaleASR/ASR/transformer/extra_requirements.txt pypi

datasets >=3.1

recipes/LibriSpeech/quantization/extra_requirements.txt pypi

scikit-learn *

recipes/PeoplesSpeech/ASR/transformer/extra_requirements.txt pypi

datasets ==3.1.0
librosa *
soundfile *

recipes/SEP-28k/stuttering-detection/extra_requirements.txt pypi

scikit-learn *
tensorboard *