Recent Releases of speechbrain

speechbrain - v1.0.3

Major Changes

  • Add People's Speech (30,000 hours) Conformer ASR (Code from Samsung AI Center Cambridge) by @TParcollet in https://github.com/speechbrain/speechbrain/pull/2767
  • Audio and Music SSL by @poonehmousavi in https://github.com/speechbrain/speechbrain/pull/2755
  • Add new Audio Tokenziers by @poonehmousavi in https://github.com/speechbrain/speechbrain/pull/2751
  • Libriheavy (Code from SAIC-Cambridge) by @shucongzhang in https://github.com/speechbrain/speechbrain/pull/2781
  • Conformer recipe for LargeScaleASR (code from Samsung AI Center Cambridge) by @TParcollet in https://github.com/speechbrain/speechbrain/pull/2806
  • Rotary Position Embedding (RoPE) for ASR (code from Samsung Cambridge) by @shucongzhang in https://github.com/speechbrain/speechbrain/pull/2799
  • Voice analysis functions by @pplantinga in https://github.com/speechbrain/speechbrain/pull/2689

New Contributors

  • @rogiervd made their first contribution in https://github.com/speechbrain/speechbrain/pull/2734
  • @benniekiss made their first contribution in https://github.com/speechbrain/speechbrain/pull/2746
  • @mirofedurco made their first contribution in https://github.com/speechbrain/speechbrain/pull/2762
  • @kit1980 made their first contribution in https://github.com/speechbrain/speechbrain/pull/2797
  • @IliasMAOUDJ made their first contribution in https://github.com/speechbrain/speechbrain/pull/2574

Full Changelog: https://github.com/speechbrain/speechbrain/compare/v1.0.2...v1.0.3

- Python
Published by github-actions[bot] 11 months ago

speechbrain - v1.0.2

This is a minor update which includes some new features and recipes, internal improvements, bugfixes and improved tutorials.

Here follows a changelog of the main changes (omitting some minor bugfixes):

Notable changes

New features

  • Added layer dropout support for TransformerASR (#2309)
  • Added the sign flip augmentation for ASR/EEG/potentially more (#2636)
  • Improved reproducibility by adding seed_everything, improved DDP handling for seeding (#2654)
  • Added "quirks" to centralize overriden PyTorch defaults and workarounds (among other things) in an easy-to-find fashion, with proper logging (#2558)

Bugfixes

  • Improved performance for VAD inference (#2683)
  • Fixed various issues with DDP handling (#2682)
  • Fixed broken augmentation integration tests (#2628)
  • Fixed error when processing newer CommonVoice (#2647)
  • Fixed concat bug in augmentation (#2717)
  • Removed EOS in G2P inference which was incorrectly introduced (#2718)
  • ... and some more

New fetching semantics

We have made a number of changes that affect how fetch works, and this affects various things in a way you should be aware of.

  • In various fetching-related code, such as inference interfaces' from_hparams, the savedir refers to the directory where files should be collected. It is now optional and defaults to None.
    • When fetching files (models, audio, etc.) from local paths or HuggingFace repositories, you no longer need to specify a target directory.
    • For local file fetching, the path is returned directly.
    • For HF fetching, HF cache is used directly.
    • For URL fetching, you still need to specify the savedir.
    • Inference interfaces will no longer pollute your working directory with symlinks by default when loading audio files.
    • Avoiding symlink creation by default is now much friendlier to Windows compatibility. We also added some warnings to help.
    • If you do specify a savedir, behavior should be largely unchanged (although more robust).
  • In various fetching-related code, you can now specify a fetch_strategy.
    • The fetching strategy dictates what to do when a file is found locally, but not in the desired savedir. For instance, if some model file is available in HuggingFace cache, you can now choose between copying, symlinking or not linking the file in the savedir.
  • fetch now has an allow_network parameter, which defaults to True. When disabled, this fails if the URI is not available locally, or if the file isn't found locally in HF cache.
  • fetch also now has an allow_updates parameter, which defaults to True. It interrogates HF even if a local path is present in order to update any model files (or switch revisions), if required.

- Python
Published by asumagic over 1 year ago

speechbrain - v1.0.1

This is a minor update which includes some new features and recipes, internal improvements, bugfixes, compatibility improvements, and wider Python backwards compatibility.

NOTE: both v1.0.0 and v1.0.1 were released earlier than this date on GitHub. These releases were accidentally marked as drafts.

Notable changes

  • We now advertise support from Python 3.8 to 3.12 (instead of 3.9-3.11) and improved our testing in that regard.
  • Major improvements to Whisper integration with various tasks supported, fine-tuning fixes, performance improvements, and more (#2450)
  • Improved model parameter info printing (#2470)
  • Added new metrics mostly targeted at speech recognition (#2451)
  • Added backwards compatibility for old speechbrain.pretrained imports that were broken following a v1.0 refactor (#2485)
  • Updated BibTeX citation. You may find the latest one here at all times.

Recipes and other features

  • Added a VoxPopuli transducer recipe (#2421)
  • Upgraded CommonVoice transducer and transformer recipes (#2433, #2465, #2560) with various improvements
  • Refactored ESC50 recipes and added FocalNets (#2499), added single-sample inference for interpretation (#2616)
  • Speechtokenizer integration (#2497)
  • Added support for HiFi-GAN to work with new SSL Discrete Tokens, with support for bitrate-scalable training on LJSpeech and LibriTTS (#2571)
  • Added recipe for Listenable Maps for Audio Classifiers (#2538)

Fixes

  • Fixed errors with ctc_segmentation (#2505)
  • Fixes and refactors for RelPosEncXL (#2498)
  • Fixed input normalization incorrectly applying in-place to the user inputs in some cases (#2504)
  • DDP fixes (#2506, #2633)
  • Fixed backwards compatibility with older torchaudio version when not using streaming stuff (#2532)
  • Fixed Separation and Enhancement recipes behavior when NaN is encountered (#2524)
  • Fix for too aggressive SpecAugment in LibriSpeech transducer recipe (#2548)
  • Fix for double file conversion in CommonVoice data preparation (#2557)
  • Fixed SpectrogramDrop errors in some cases (#2564)
  • Potential fix for Windows install issues when using the --editable flag (#2541)
  • Improvements to SSL Discrete Tokens and refactoring (#2509)
  • Fixes and improvements for quaternion networks (#2464)
  • Fixed AISHELL models and added backwards compatibility warning for causal in TransformerASR (#2606)
  • ... and a few more

Internal changes

  • Improved code quality with the inclusion of spellchecking, include sorting, and stronger documentation linting in the CI pipeline.
  • Significantly improved CI performance for a better PR development experience.
  • Refactored the module structure of SpeechBrain to use lazy-loading when possible, reducing import times and greatly reducing circular import headaches.
  • Introduced some infrastructure to enable some preprocessing when importing state dicts](https://gist.github.com/asumagic/1289f391acf849ca01b8ca7f9c5dd069)

- Python
Published by asumagic over 1 year ago

speechbrain - v1.0.0

GitHub Repo stars Please, help our community project. Star on GitHub!

🚀 What's New in SpeechBrain 1.0?

📅 On February 2024, we released SpeechBrain 1.0, the result of a year-long collaborative effort by a large international network of developers led by our exceptional core development team.

📊 Some Numbers:

  • SpeechBrain has evolved into a significant project and stands among the most widely used open-source toolkits for speech processing.
  • Over 140 developers have contributed to our repository, getting more than 7.3k stars on GitHub.
  • Monthly downloads from PyPI have reached an impressive 200k.
  • Expanded to over 200 recipes for Conversational AI, featuring more than 100 pretrained models on HuggingFace.

🌟 Key Updates:

  • SpeechBrain 1.0 introduces significant advancements, expanding support for diverse datasets and tasks, including NLP and EEG processing.

  • The toolkit now excels in Conversational AI and various sequence processing applications.

  • Improvements encompass key techniques in speech recognition, streamable conformer transducers, integration with K2 for Finite State Transducers, CTC decoding and n-gram rescoring, new CTC/joint attention Beam Search interface, enhanced compatibility with HuggingFace Models (including GPT2 and Llama2), and refined data augmentation, training, and inference processes.

  • We have created a new repository dedicated to benchmarks, accessible at here. At present, this repository features benchmarks for various domains, including speech self-supervised models (MP3S), continual learning (CL-MASR), and EEG processing (SpeechBrain-MOABB).

For detailed technical information, please refer to the section below.

🔄 Breaking Changes

People familiar with SpeechBrain know very well that we do our best to avoid backward incompatible changes. While SpeechBrain has consistently prioritized maintaining backward compatibility, the introduction of this new major version presented an opportunity for significant enhancements and refactorings.

  1. 🤗 HuggingFace Interface Refactor:

    • Previously, our interfaces were limited to specific models like Whisper, HuBERT, WavLM, and wav2vec 2.0.
    • We've refactored the interface to be more general, now supporting any transformer model from HuggingFace including LLMs.
    • Simply inherit from our new interface and enjoy the flexibility.
    • The updated interfaces can be accessed here.
  2. 🔍 BeamSearch Refactor:

    • The previous beam search interface, while functional, was challenging to comprehend and modify due to the combined search and rescoring parts.
    • We've introduced a new interface where scoring and search are separated, managed by distinct functions, resulting in simpler and more readable code.
    • This update allows users to easily incorporate various scorers, including n-gram LM and custom heuristics, in the search part.
    • Additionally, support for pure CTC training and decoding, batch and GPU decoding, partial or full candidate scoring, and N-best hypothesis output with neural LM rescorers has been added.
    • An interface to K2 for search based on Finite State Transducers (FST) is now available.
    • The updated decoders are available here.
  3. 🎨 Data Augmentation Refactor:

    • The data augmentation capabilities have been enhanced, offering users access to various functions in speechbrain/augment.
    • New techniques, such as CodecAugment, RandomShift (Time), RandomShift (Frequency), DoClip, RandAmp, ChannelDrop, ChannelSwap, CutCat, and DropBitResolution, have been introduced.
    • Augmentation can now be customized and combined using the Augmenter interface in speechbrain/augment/augmenter.py, providing more control during training.
    • Take a look here for a tutorial on speech augmentation.
    • The updated augmenters are available here.
  4. 🧠 Brain Class Refactor:

    • The fit_batch method in the Brain Class has been refactored to minimize the need for overrides in training scripts.
    • Native support for different precisions (fp32, fp16, bf16), mixed precision, compilation, multiple optimizers, and improved multi-GPU training with torchrun is now available.
    • Take a look at the refactored brain class here.
  5. 🔍 Inference Interfaces Refactor:

    • Inference interfaces, once stored in a single file (speechbrain/pretrained/interfaces.py), are now organized into smaller libraries in speechbrain/inference, enhancing clarity and intuitiveness.
    • You can access the new inference interfaces here.

🔊 Automatic Speech Recognition

  • Developed a new recipe for training a Streamable Conformer Transducer using Librispeech dataset (accessible here). The streamable model achieves a Word Error Rate (WER) of 2.72% on the test-clean subset.
  • Implemented a dedicated inference inference to support streamable ASR (accessible here).
  • New models, including HyperConformer andd Branchformer have been introduced. Examples of recipes utilizing them can be found here.
  • Additional support for datasets like RescueSpeech, CommonVoice 14.0, AMI, Tedlium 2.
  • The ASR search pipeline has undergone a complete refactoring and enhancement (see comment above).
  • A new recipe for Bayesian ASR has been added here.

🔄 Interface with Kaldi2 (K2-FSA)

  • Integration of an interface that seamlessly connects SpeechBrain with K2-FSA, allowing for constrained search and more.
  • Support for K2 CTC training and lexicon decoding, along with integration of K2 HLG and n-gram rescoring.
  • Competitive results achieved with Wav2vec2 on LibriSpeech test sets.
  • Explore an example recipe utilizing K2 here.

🎙 Speech Synthesis (TTS)

🌐 Speech-to-Speech Translation:

  • Introduction of new recipes for CVSS datasets and IWSLT 2022 Low-resource Task, based on mBART/NLLB and SAMU wav2vec.

🌟 Speech Generation

  • Implementation of diffusion and latent diffusion techniques with an example recipe showcased on AudioMNIST.

🎧 Interpretability of Audio Signals

  • Implementation of Learning to Interpret and PIQ techniques with example recipes demonstrated on ECS50.

😊 Speech Emotion Diarization

  • Support for Speech Emotion Diarization, featuring an example recipe on the Zaion Emotion Dataset. See the training recipe here.

🎙️ Speaker Recognition

🔊 Speech Enhancement

  • Release of a new Speech Enhancement baseline based on the DNS dataset.

🎵 Discrete Audio Representations

  • Support for pretrained models with discrete audio representations, including EnCodec and DAC.
  • Support for discretization of continuous representations provided by popular self-supervised models such as Hubert and Wav2vec2.

🤖 Interfaces with Large Language Models

  • Creation of interfaces with popular open-source Large Language Models, such as GPT2 and Llama2.
  • These models can be easily fine-tuned in SpeechBrain for tasks like Response Generation, exemplified with a recipe for the MultiWOZ dataset.
  • The Large Language Model can also be employed to rescore n-best ASR hypotheses.

🔄 Continuous Integration

  • All recipes undergo automatic testing with one or multiple GPUs, ensuring robust performance.
  • HuggingFace interfaces are automatically verified, contributing to a seamless integration process.
  • Continuous improvement of integration and unitary tests to comprehensively cover most functionalities within SpeechBrain.

🔍 Profiling

  • We have simplified the Profiler to enable easier identification of computing bottlenecks and quicker evaluation of model efficiency.
  • Now, you can profile your model during training effortlessly with:

bash python train.py hparams/config.yaml --profile_training --profile_warmup 10 --profile_steps 5

  • Check out our tutorial for more detailed information.

📈 Benchmarks

  • Release of a new benchmark repository, aimed at aiding the community in standardization across various areas.
  1. CL-MASR (Continual Learning Benchmark for Multilingual ASR):
    • A benchmark designed to assess continual learning techniques on multilingual speech recognition tasks
- Provides scripts to train multilingual ASR systems, specifically Whisper and WavLM-based, on a subset of 20 languages selected from Common Voice 13 in a continual learning fashion.
- Implementation of various methods, including rehearsal-based, architecture-based, and regularization-based approaches.
  1. Multi-probe Speech Self Supervision Benchmark (MP3S):

    • A benchmark for accurate assessment of speech self-supervised models.
    • Noteworthy for allowing users to select multiple probing heads for downstream training.
  2. SpeechBrain-MOABB:

    • A benchmark offering recipes for processing electroencephalographic (EEG) signals, seamlessly integrated with the popular Mother of all BCI Benchmarks (MOABB).
    • Facilitates the integration and evaluation of new models on all supported tasks, presenting an interface for easy model integration and testing, along with a fair and robust method for comparing different architectures.

🔄 Transitioning to SpeechBrain 1.0

  • Please, refer to this tutorial for in-depth technical information regarding the transition to SpeechBrain 1.0.

New Contributors

  • @ywk991112 made their first contribution in https://github.com/speechbrain/speechbrain/pull/2228
  • @kimmchii made their first contribution in https://github.com/speechbrain/speechbrain/pull/2320
  • @ppisljar made their first contribution in https://github.com/speechbrain/speechbrain/pull/2336
  • @gaspardpetit made their first contribution in https://github.com/speechbrain/speechbrain/pull/2335
  • @RISHIKREDDYL made their first contribution in https://github.com/speechbrain/speechbrain/pull/2389
  • @ZhaoZeyu1995 made their first contribution in https://github.com/speechbrain/speechbrain/pull/2345

Full Changelog: https://github.com/speechbrain/speechbrain/compare/v0.5.16...v1.0.0

- Python
Published by mravanelli about 2 years ago

speechbrain - v0.5.16

SpeechBrain 0.5.16 will be the last minor version of SpeechBrain before the major release of SpeechBrain 1.0.

In this minor version, we have focused on refining the existing features without introducing any interface changes, ensuring a seamless transition to SpeechBrain 1.0 where backward incompatible modifications will take place.

Key Highlights of SpeechBrain 0.5.16:

Bug Fixes: Numerous small fixes have been implemented to enhance the overall stability and performance of SpeechBrain.

Testing and Documentation: We have dedicated efforts to improve our testing infrastructure and documentation, ensuring a more robust and user-friendly experience.

Expanded Model and Dataset Support: SpeechBrain 0.5.16 introduces support for several new models and datasets, enhancing the versatility of the platform. For a detailed list, please refer to the commits below.

Stay informed and get ready for the groundbreaking SpeechBrain 1.0, where we will unveil substantial changes and exciting new features.

Thank you for being a part of the SpeechBrain community!

Commits

  • [cea36b4]: Update README.md (Mirco Ravanelli) #1599
  • [cead130]: Updated README.md (prometheus) #975
  • [779c620]: Update README.md (Mirco Ravanelli) #2124
  • [32af2ac]: update requirement (to avoid deprecation error) (Mirco Ravanelli) #975
  • [b039df1]: small fixes (Mirco Ravanelli) #975
  • [07e7c73]: small fixes (Mirco Ravanelli) #975
  • [dac6842]: Update README.md (Mirco Ravanelli) #975
  • [75f4c66]: Update README.md (Mirco Ravanelli) #975
  • [327a3f5]: Fixed SSVEP yaml file (prometheus) #975
  • [067d94e]: Fixed conflicts (prometheus) #975
  • [331741d]: Fixed read/write conflicts mne config file when training many models in parallel (prometheus) #975
  • [0f25d5b]: Added hparam files for other architectures (prometheus) #975
  • [9ba76e3]: Updated LMDA, forcing odd kernel size in depth attention (prometheus) #975
  • [6336200]: Fixed activation in LMDA (prometheus) #975
  • [1593cc4]: Fixed issue in deepconvnet (prometheus) #975
  • [2f0f5f0]: Fixed issue with shallowconvnet (prometheus) #975
  • [8f70136]: Fixed issue with lmda (prometheus) #975
  • [ac4f9e4]: Merge remote-tracking branch 'origin/develop' into fixeval (Adel Moumen) #2123
  • [cdce80c]: fix ddp issue with loading a key (Adel Moumen) #2128
  • [66633a0]: Added template yaml files (prometheus) #975
  • [6f631a7]: minor additions for tests (pradnya-git-dev) #2120
  • [331acdb]: add notes on tests with non-default gpu (Mirco Ravanelli) #2130
  • [091b3ce]: fixed hard-coded device (Mirco Ravanelli) #2130
  • [cc72c9e]: fixed hard-coded device (Mirco Ravanelli) #2130
  • [c60e606]: fixed hard-coded device (Mirco Ravanelli) #2130
  • [253859e]: Resolve paths so relative works too (Aku Rouhe) #2128
  • [8a98401]: small fix on orion flag (Mirco Ravanelli) #975
  • [7da9a95]: extend fix to all files (Mirco Ravanelli) #975
  • [4b09ff2]: fix style (Mirco Ravanelli) #975
  • [ced2922]: Merge remote-tracking branch 'upstream/develop' into eeg_decoding (Mirco Ravanelli) #975
  • [5e070a2]: fix useless file (Mirco Ravanelli) #975
  • [46565cf]: Merge branch 'develop' of https://github.com/speechbrain/speechbrain into develop (xuechenliu) #2142
  • [19235f2]: Merge remote-tracking branch 'upstream/Adel-Moumen-revertcommitddp' into revertcommitddp (Adel Moumen) #2128
  • [2fb247f]: Save the checkpoint folder and meta only on the main process and communicate to all procs (Peter Plantinga) #2132
  • [f37d433]: Only broadcast checkpoint folder if distributed (Peter Plantinga) #2132
  • [e23da7d]: Initialize external loggers only on main process (Peter Plantinga) #2134
  • [67b1255]: fixes (BenoitWang) #2119
  • [70d8901]: Merge branch 'develop' into fs2internalalignment (Yingzhi WANG) #2119
  • [5565073]: Add file check on all recipe tests (#2126) (Mirco Ravanelli) #2126
  • [76923a4]: removeused varibles, add exception types (BenoitWang) #2119
  • [0a18729]: Merge branch 'fs2internalalignment' of https://github.com/BenoitWang/speechbrain into fs2internalalignment (BenoitWang) #2119
  • [d10f9c9]: add docstrings and examples (BenoitWang) #2119
  • [300aba7]: fix (BenoitWang) #2119
  • [32eea80]: Improve documentation of multi-process checkpointing (Peter Plantinga) #2132
  • [1f1a657]: Add unittest for parallel checkpointing (Peter Plantinga) #2132
  • [c742768]: Update tests/unittests/test_checkpoints.py (Peter Plantinga) #2132
  • [cc02ab9]: Update speechbrain/utils/checkpoints.py (Peter Plantinga) #2132
  • [1c91654]: Update speechbrain/utils/checkpoints.py (Peter Plantinga) #2132
  • [9325b56]: add unknown as pad token id (poonehmousavi) #2086
  • [e03397a]: add unk_token for pad (poonehmousavi) #2086
  • [ba4511c]: fix precommit issue (poonehmousavi) #2086
  • [296d14d]: Update python versions tested in CI (Peter Plantinga) #2138
  • [9781034]: Fix version 3.10, interpreted as 3.1 (Peter Plantinga) #2138
  • [6132693]: Merge branch 'speechbrain:develop' into GPT2-finetuning (Pooneh Mousavi) #2086
  • [5cc966c]: Update pytest version (Peter Plantinga) #2138
  • [c848ec9]: readme update (pradnya-git-dev) #2120
  • [5eb55e3]: Merge remote-tracking branch 'upstream/develop' into bugfix/checkpoint-folder-on-main (Mirco Ravanelli) #2132
  • [7b9327b]: parallel checkpoint test sync via file (Peter Plantinga) #2132
  • [23b5dbc]: Update tests/unittests/test_checkpoints.py (Peter Plantinga) #2132
  • [bcbe5da]: Remove destroyprocessgroup() which causes hang (Peter Plantinga) #2132
  • [3298a29]: Merge branch 'develop' into fixDDP (Mirco Ravanelli) #2130
  • [25fa18a]: fix EOS issue (poonehmousavi) #2086
  • [b9e3fa4]: minor fix (poonehmousavi) #2086
  • [be4a6f1]: Merge branch 'develop' of https://github.com/speechbrain/speechbrain into resnet_spkreg (xuechenliu) #2142
  • [164f8fe]: Added bash script to save yaml files, fixed issue with orion config file, added baselines, added EEGConformer, removed DeepConvNet and LMDA (prometheus) #975
  • [3d39ccd]: Fixed issue in yaml (prometheus) #975
  • [321c9f7]: Fixed issue in baseline yaml (prometheus) #975
  • [a35b964]: Commit on the speaker embedding extraction script (xuechenliu) #2142
  • [b78eacf]: minor cleaning on the hparams (xuechenliu) #2142
  • [6fd881e]: Removed baselines, fixes in code format of ShallowConvNet, changes in hparam space of ShallowConvNet and EEGConformer (prometheus) #975
  • [d3e9ae0]: EOS issue (poonehmousavi) #2086
  • [d16ea05]: fix (poonehmousavi) #2086
  • [296398d]: fix pad_id (poonehmousavi) #2086
  • [43b4e29]: final fix for generation (poonehmousavi) #2086
  • [f860f4e]: disable open end generation (poonehmousavi) #2086
  • [646ec65]: add interface and increase dropout (BenoitWang) #2119
  • [966b3d5]: fix interface (BenoitWang) #2119
  • [7371caa]: fix import (BenoitWang) #2119
  • [7a21a66]: Bump gitpython from 3.1.32 to 3.1.34 in /recipes/BinauralWSJ0Mix (dependabot[bot]) #2156
  • [907f79a]: Use torchrun instead of torch.distributed.launch (Peter Plantinga) #2158
  • [73b8365]: Fix ddp test by using os environ local_rank (Peter Plantinga) #2158
  • [5f63f6d]: Remove localrank from runopts (Peter Plantinga) #2158
  • [98bcd07]: Update resample_folder.py to run with torchaudio 2.0 (Martin Nordstrom) #2162
  • [5b4ca63]: Fix path to outputfilename in createmixtures_metadata.py (Martin Nordstrom) #2162
  • [f64f569]: major bug fix; enhanced signal now fed into whisper instead of clean signal; revised results (sangeet2020) #2163
  • [f223310]: Bump gitpython from 3.1.34 to 3.1.35 in /recipes/BinauralWSJ0Mix (dependabot[bot]) #2164
  • [987aa35]: Merge branch 'speechbrain:develop' into RescueSpeech (Sangeet Sagar) #2163
  • [89de3dd]: minor changes (sangeet2020) #2163
  • [f8654a9]: fix test yaml (Mirco Ravanelli) #2165
  • [5f87b03]: minor changes (sangeet2020) #2163
  • [9630882]: fix yaml inconsistencies (Mirco Ravanelli) #2165
  • [dd4abba]: fix trailing whitespace (Mirco Ravanelli) #2165
  • [2545b43]: readme update dropbox links (sangeet2020) #2163
  • [284e347]: update dropbox link in tests/recipes (sangeet2020) #2163
  • [fa25f82]: Merge branch 'speechbrain:develop' into RescueSpeech (Sangeet Sagar) #2163
  • [775eeb0]: update dropbox links (sangeet2020) #2163
  • [f7d273d]: Merge branch 'develop' of github.com:speechbrain/speechbrain into fix-reproduce-libriparty (Martin Nordstrom) #2162
  • [5c57237]: YouTube channel / online summit (Adel Moumen) #2166
  • [fc3d72d]: Merge branch 'develop' of https://github.com/speechbrain/speechbrain into resnet_spkreg (Xuechen Liu) #2142
  • [b82e798]: fix fetching and checkpointing due to failing recipe tests (Mirco Ravanelli) #2167
  • [972cc65]: let checkpoiting with the same name (Mirco Ravanelli) #2167
  • [bc8906c]: fix black (Mirco Ravanelli) #2167
  • [26da725]: commented parallel checkpointing test. It is currently failing (even on other PRs) only on the CI servers (Mirco Ravanelli) #2167
  • [40d091b]: sort execution of recipes tests (Mirco Ravanelli)
  • [c6ef85d]: sort recipe tests + minor fixes (Mirco Ravanelli)
  • [ef92a05]: Merge remote-tracking branch 'upstream/develop' into fix-reproduce-libriparty (Mirco Ravanelli) #2162
  • [1a9f06a]: update dropbox & hf links (BenoitWang) #2119
  • [8d89a40]: resolve conflict (BenoitWang) #2119
  • [10d85e3]: minor edits for clarify improvements (Mirco Ravanelli) #2162
  • [525b74a]: Merge remote-tracking branch 'upstream/develop' into use-torchrun (Mirco Ravanelli) #2158
  • [bc81789]: Merge remote-tracking branch 'upstream/develop' into resnet_spkreg (Mirco Ravanelli) #2142
  • [9856912]: Merge branch 'develop' into fixDDP (Mirco Ravanelli) #2130
  • [825e114]: fix numpy 1.24 issue (BenoitWang) #2119
  • [56abcb1]: update readme (BenoitWang) #2119
  • [33c4d5b]: Merge remote-tracking branch 'upstream/develop' into fs2internalalignment (Mirco Ravanelli) #2119
  • [23e3ceb]: update to latest dev + minor modifications (Mirco Ravanelli) #2119
  • [b5be99f]: fix comments and add docstring (Xuechen Liu) #2142
  • [901b5e3]: update to latest dev + small fixes (Mirco Ravanelli) #2120
  • [8c6db1d]: Merge branch 'develop' into MSTTS (Mirco Ravanelli) #2120
  • [0b09dd6]: fix yaml + fix recipe test on voxceleb (Mirco Ravanelli) #2120
  • [ae6da04]: Merge branch 'MSTTS' of https://github.com/pradnya-git-dev/speechbrain into MSTTS (Mirco Ravanelli) #2120
  • [3ea3a1f]: add missing link (Mirco Ravanelli) #2120
  • [e88b65b]: Merge branch 'develop' of https://github.com/speechbrain/speechbrain into resnet_spkreg (Xuechen Liu) #2142
  • [3280d03]: fix recipe test, add docstring examples (BenoitWang) #2119
  • [0c38b08]: fix examples (BenoitWang) #2119
  • [eb7b839]: Merge branch 'speechbrain:develop' into MSTTS (pradnya-git-dev) #2120
  • [de139b2]: code optimization (pradnya-git-dev) #2120
  • [9364199]: code optimization - loss restore (pradnya-git-dev) #2120
  • [4fd2380]: minor documentation change (pradnya-git-dev) #2120
  • [d3be8d3]: minor documentation fix for tests (pradnya-git-dev) #2120
  • [a508c40]: updating loss example (pradnya-git-dev) #2120
  • [ff0c768]: updating hparams (pradnya-git-dev) #2120
  • [b17e13c]: removing script redundancy (pradnya-git-dev) #2120
  • [e813476]: minor changes for tests (pradnya-git-dev) #2120
  • [f6957ae]: updating recipe entry (pradnya-git-dev) #2120
  • [0c42325]: minor changes for tests (pradnya-git-dev) #2120
  • [ce07c3a]: changes for inference (pradnya-git-dev) #2120
  • [22a7743]: internal sorting for input texts (pradnya-git-dev) #2120
  • [fbb074c]: improve bug_report.yaml (Adel Moumen) #2172
  • [45a65a5]: fix title (Adel Moumen) #2172
  • [8790c07]: Update pullrequesttemplate.md (Adel Moumen) #2172
  • [2dae0cb]: linters (Adel Moumen) #2172
  • [554ca2e]: Update README.md (#2171) (Adel Moumen) #2171
  • [fccb581]: Remove distributed_launch flag and update docs (Peter Plantinga) #2158
  • [9e1b588]: Fix check for rank and local rank (Peter Plantinga) #2158
  • [2d8e6f8]: small improvement in the doc + manage PLACEHOLDER and output folder (Mirco Ravanelli) #2142
  • [2cdc63f]: fix hard-coded devices (#2178) (Mirco Ravanelli) #2178
  • [3457755]: Fix multi-head attention when returnattnweights=False (Luca Della Libera) #2183
  • [3a16166]: Update multi-head attention docstring (Luca Della Libera) #2183
  • [221f2da]: Updated yaml files after hparam tuning (prometheus) #975
  • [208bccb]: Updated EEGConformer (prometheus) #975
  • [dcc29c7]: Updated README.md (prometheus) #975
  • [5b791fe]: Updated README.md (prometheus) #975
  • [9861876]: Merge remote-tracking branch 'upstream/develop' into eeg_decoding (Mirco Ravanelli) #975
  • [d0296f5]: fix linters (Mirco Ravanelli) #975
  • [e412656]: improve README (Mirco Ravanelli) #975
  • [e8be915]: remove unnecesary folder (Mirco Ravanelli) #975
  • [e431763]: remove files that will be added into speechbrain benchmark (Mirco Ravanelli) #975
  • [63b2f99]: Merge branch 'speechbrain:develop' into GPT2-finetuning (Pooneh Mousavi) #2086
  • [dce8021]: Merge branch 'speechbrain:develop' into GPT2-finetuning (Pooneh Mousavi) #2086
  • [b90034c]: add response-generator interface (poonehmousavi) #2086
  • [34aafe0]: fix pytest (poonehmousavi) #2086
  • [eeda2c0]: fix pytest (poonehmousavi) #2086
  • [9284f24]: fix docstring (poonehmousavi) #2086
  • [b5adc8f]: updating hparams with the current best (pradnya-git-dev) #2120
  • [4f02ca8]: fix hyaml bug (poonehmousavi) #2086
  • [54ab2f8]: minor fix (poonehmousavi) #2086
  • [ed6f08d]: fix interface logging issue (poonehmousavi) #2086
  • [801162e]: fix precommit issue (poonehmousavi) #2086
  • [7cfd162]: HyperConformer (#1905) (Florian Mai) #1905
  • [2983f8a]: clean commnets (poonehmousavi) #2086
  • [697c708]: fix readme (poonehmousavi) #2086
  • [cf48a46]: change interface to be compatibale with pytest (poonehmousavi) #2086
  • [629b99e]: Update README.md (Adel Moumen) #2189
  • [ceb7838]: fix typo that preveted recipe tests to run (Mirco Ravanelli) #2086
  • [fd3b8a8]: automatic download + fix replacement path (Mirco Ravanelli) #2086
  • [9634e9d]: remove transformers from extra-req as already in the main requirements (Mirco Ravanelli) #2086
  • [3d37983]: fix linter (Mirco Ravanelli) #2086
  • [cd41db3]: DNS recipe (#1742) (Sangeet Sagar) #1742
  • [e229e1a]: Attempting to fix failing test (with pytorch 2.1) (#2193) (Mirco Ravanelli) #2193
  • [4ab5219]: Broadcast the decision to checkpoint to all processes (#2192) (Peter Plantinga) #2192
  • [f4e8dd5]: update huggingface_hub requirement to avoid TypeDict error (tuanct1997) #2195
  • [264a0bc]: Avoid sync if mid-epoch checkpoints are disabled (Peter Plantinga) #2200
  • [918d8ef]: new pitch (Mirco Ravanelli) #2201
  • [92f541e]: Bump gitpython from 3.1.35 to 3.1.37 in /recipes/BinauralWSJ0Mix (dependabot[bot]) #2203
  • [5eec78b]: fix open rir (Mirco Ravanelli) #2205
  • [86670b4]: small follow up fix on openrir (Mirco Ravanelli)
  • [fab9657]: update dropbox (Mirco Ravanelli) #2201
  • [67d0de9]: Update README.md (Mirco Ravanelli) #2201
  • [19cbb87]: Update LJSpeech.csv (Mirco Ravanelli) #2201
  • [6271dd0]: remove related doc with distributed_launch (Adel Moumen) #2207
  • [219476c]: pre-commit (Adel Moumen) #2207
  • [21d619c]: adding random speaker voice generation (pradnya-git-dev) #2120
  • [62ac16e]: Merge branch 'develop' into MSTTS (pradnya-git-dev) #2120
  • [b84fa8d]: minor changes for flake8 (pradnya-git-dev) #2120
  • [f69f280]: updates for doctests (pradnya-git-dev) #2120
  • [5897742]: fix one issue wit recipe tests (Mirco Ravanelli) #2120
  • [e0d5a1b]: last fix pitch fastspeec2 (Mirco Ravanelli)
  • [55b442d]: readme update (pradnya-git-dev) #2120
  • [79cff28]: minor update for tests (pradnya-git-dev) #2120
  • [a78a571]: update documentation to clarify when to use --jit (Mirco Ravanelli) #2215
  • [ba492f9]: small fix in recipe tests (Mirco Ravanelli) #2120
  • [ec359cb]: add dropbox link (Mirco Ravanelli) #2120
  • [40bbe0f]: add performance notice (Mirco Ravanelli) #2120
  • [fc892ac]: last change (Mirco Ravanelli) #2120
  • [3c840ed]: reverting an error added by HyperConformer (code from Samsung AI Cambridge) (#2217) (Parcollet Titouan) #2217
  • [121f55b]: fix recipe tests tool (#2218) (Adel Moumen) #2218
  • [81138e8]: ASR recipe for Tedlium2 (code from Samsung AI Cambridge) (#2191) (Parcollet Titouan) #2191
  • [7f62dd8]: Add speech-to-speech translation (#2044) (Jarod) #2044
  • [e09cdac]: Refactor aishell data prep (#2219) (Adel Moumen) #2219
  • [bd27e99]: Create .gitignore (#2222) (Adel Moumen) #2222
  • [ab3c962]: fix incorrect parameter in LibriTTS hifigan vocoder (Chaanks) #2244
  • [2f27f7e]: fix failing recipe test (tiny fix) (Mirco Ravanelli)
  • [94862c8]: Update version.txt (#2256) (Mirco Ravanelli) #2256
  • [0ac4dc3]: Merge branch 'develop' (Mirco Ravanelli) #2257
  • [a581cae]: New version (#2257) (Mirco Ravanelli) #2257
  • [65c0113]: Merge branch 'develop' (Mirco Ravanelli)

- Python
Published by github-actions[bot] over 2 years ago

speechbrain - v0.5.15

SpeechBrain 0.5.15 Release Notes

We are thrilled to announce the release of SpeechBrain version 0.5.15! This new version represents a significant step forward for our open-source Conversational AI toolkit. The core team, along with a rapidly growing network of contributors, has worked diligently to enhance and expand the toolkit while addressing various issues.

What's New?

This release marks a crucial point as it will likely be the final minor version before the highly anticipated SpeechBrain 1.0, scheduled for release in the coming months. We have achieved notable milestones in this version, and a summary of the key achievements is presented below. For a comprehensive list of all changes, please refer to the detailed notes at the end.

Notable Achievements

  1. Benchmark Repository: We are proud to introduce the benchmark repository, which aims to provide a standard recipe for researchers to benchmark and compare different techniques and models. Currently, the following benchmarks are available:

    • CLMASR: Evaluates continual learning techniques for speech recognition in new languages.
    • MP3S Benchmarks: Assesses speech self-supervised representation across various tasks and with different downstream models (multi-probe).
  2. Enhanced User Experience: We've made it more convenient for our users to access logs and checkpoints by migrating the logs and output folders from Gdrive to Dropbox.

  3. New Models with Improved Performance: We implemented a modified Fastspeech 2.0. This offers efficiency and pretty high performance. We've made significant steps in enhancing performance on Librispeech, thanks to the implementation of better Conformers and Branchformers. Additionally, we've introduced a performant Conformer Transducer and the SLI-GRU model.

  4. Post-doc Interpretability Techniques: We now offer improved support for post-doc interpretability techniques. Refer to the ESC50 recipe for more information.

  5. New Datasets: We've incorporated recipes for new datasets, including the recently released RescueSpeech (speech recognition in rescue and domain environments) and the Zaion Emotion Dataset for Speech Emotion Recognition.

  6. Enhanced Korean ASR: We've made improvements to KsponSpeech for Korean Automatic Speech Recognition.

  7. Improved Recipe Tests: We've taken steps to enhance recipe tests, ensuring better reliability and performance.

  8. ** Whisper Fixes**: We've fixed Whisper recipes and interfaces in a way that maintains backward compatibility. This was necessary to address interface changes made in the original model.

  9. Various Fixes: In addition to the above achievements, we've addressed several other issues, including gradient accumulation and various minor fixes.

Thank you to our dedicated community of contributors and users for making this release possible! We invite you to explore the new features and improvements in SpeechBrain 0.5.15 and look forward to the upcoming release of SpeechBrain 1.0. Happy SpeechBrain-ing!

For a complete list of changes, please refer to the detailed release notes below.

Commits

  • [4aff580]: update new model at 2.8 (Titouan Parcollet) #1782
  • [2c75924]: Merge branch 'develop' of https://github.com/speechbrain/speechbrain into libriconformerlarge (Titouan Parcollet) #1782
  • [a6d5d54]: add testing (Titouan Parcollet) #1782
  • [6cb2b25]: push new results (Titouan Parcollet) #1782
  • [fefdcfb]: Merge branch 'develop' into fastspeech2 (Mirco Ravanelli) #1572
  • [9937302]: download automatically allignments (Mirco Ravanelli) #1572
  • [9b9c6c2]: Merge branch 'speechbrain:develop' into RescueSpeech (Sangeet Sagar) #2017
  • [555de27]: add hf, dropbox, recipes.csv (Sangeet Sagar) #2017
  • [9447794]: fix pre-commit conflicts (Sangeet Sagar) #2017
  • [6707d09]: add field- noise_wav (Sangeet Sagar) #2017
  • [c604b89]: only kept the WeightedSSLModel class in this PR (salah-zaiem) #2047
  • [61d4af1]: removed useless changes (salah-zaiem) #2047
  • [042fd1e]: Update README.md (Mirco Ravanelli) #2017
  • [cdda921]: Fix with the solution used by Luca in the Benchmarks (Mirco Ravanelli) #2016
  • [07e2458]: Update README.md (Mirco Ravanelli) #2017
  • [ce53199]: Fix backward compatibility issues (Luca Della Libera) #2016
  • [9eadc9e]: add transformers (Sangeet Sagar) #2017
  • [3f1959f]: add clarity to training data; add computing power details (Sangeet Sagar) #2017
  • [a8265bf]: Update extra_requirements.txt (Parcollet Titouan)
  • [2905a52]: unknown token fix (pradnya-git-dev) #1572
  • [2d5b40f]: Update README.md (Adel Moumen) #2064
  • [d6bc4a8]: manage paths and reduce SOX warnings (BenoitWang) #2048
  • [672af7a]: fixes (BenoitWang) #2048
  • [045d2ed]: info of input/output, add links, add to recipes (BenoitWang) #2048
  • [9f841f1]: Update README.md (Mirco Ravanelli) #2048
  • [379772a]: Update ZaionEmotionDataset.csv (Mirco Ravanelli) #2048
  • [9cde9cc]: small fix (Mirco Ravanelli) #1572
  • [c14048b]: add run time and an interface (BenoitWang) #2048
  • [6d0d6ab]: merge fix (BenoitWang) #2048
  • [5a9df26]: fix (BenoitWang) #2048
  • [6a5237d]: adjusting n_symbols for unknown tokens (pradnya-git-dev) #1572
  • [bb1a1e7]: Merge branch 'speechbrain:develop' into RescueSpeech (Sangeet Sagar) #2017
  • [66e8b26]: dim=384, more epochs (BenoitWang) #1572
  • [2323cc3]: Merge branch 'fastspeech2' of https://github.com/bloodraven66/speechbrain into fastspeech2 (BenoitWang) #1572
  • [c3e180e]: added computing times for both models in classification (Cem Subakan) #2070
  • [8fde554]: added PIQ references in the readme (Cem Subakan) #2070
  • [79ec837]: added the missing docstrings in train_piq.py (Cem Subakan) #2070
  • [71fedce]: recover learning rate (BenoitWang) #1572
  • [97a67d5]: added the classification interface (Cem Subakan) #2070
  • [7c012db]: Merge branch 'speechbrain:develop' into dropbox-links (Adel Moumen) #2064
  • [688f48c]: aishell (Adel Moumen) #2064
  • [994c5f3]: corrected the right import (abdouaziz) #2074
  • [4801eda]: Gdrive -> dropbox (Adel Moumen) #2064
  • [e884d58]: aishell (Adel Moumen) #2064
  • [86b7fe6]: aishel + binaural (Adel Moumen) #2064
  • [aed71cc]: cv and cl (Adel Moumen) #2064
  • [ba02b23]: spn loss equals zero after 8 epochs, threshold 0.8 for spn predictor (BenoitWang) #1572
  • [62e9c1a]: add silence for punctuations, modify functions (BenoitWang) #1572
  • [2d0f4a6]: fixes (BenoitWang) #1572
  • [f85a40a]: finished implementing the audio classifier interface (Cem Subakan) #2070
  • [62c3802]: added the data fetching lines to classify_file (Cem Subakan) #2070
  • [da3cba6]: common language (Adel Moumen) #2064
  • [4977f09]: cv asr (Adel Moumen) #2064
  • [ac30776]: commonvoice seq2seq (Adel Moumen) #2064
  • [e00f189]: tests recipes (Adel Moumen) #2064
  • [5b4704e]: freeze spn predictor after 8 epoch and suppress warning (BenoitWang) #1572
  • [7024452]: implemented AudioInterpreter class (Cem Subakan) #2070
  • [3325af5]: added the docstring for the piqaudiointerpreter class (Cem Subakan) #2070
  • [992651f]: use shorter signals in doctests (Cem Subakan) #2070
  • [b6f40cb]: fix HF repo names (Cem Subakan) #2070
  • [ccb599f]: fix model path (Cem Subakan) #2070
  • [ed29e52]: fix repo names (Cem Subakan) #2070
  • [b852892]: updated the readme with Zhepeis paper (Cem Subakan) #2070
  • [d62f230]: added the comment on performance (Cem Subakan) #2070
  • [114fd7e]: minor (Francesco Paissan) #2070
  • [df1b18a]: removed reference to personal HF (Francesco Paissan) #2070
  • [0c57ace]: cosmetic (Francesco Paissan) #2070
  • [9f188c9]: Update README.md (Mirco Ravanelli) #2070
  • [b6f14b0]: Update README.md (Mirco Ravanelli) #2070
  • [26f4dfc]: Update README.md (Mirco Ravanelli) #2070
  • [63074f1]: Update README.md (Mirco Ravanelli) #2070
  • [3fd1f97]: Update README.md (Mirco Ravanelli) #2070
  • [35cf16c]: Update README.md (Mirco Ravanelli) #2070
  • [89bd360]: Update README.md (Mirco Ravanelli) #2070
  • [a65f05a]: Update README.md (Mirco Ravanelli) #2070
  • [e4a1816]: Update README.md (Mirco Ravanelli) #2070
  • [72f6cef]: Update README.md (Mirco Ravanelli) #2070
  • [21c8dcf]: fix recipe tests (Mirco Ravanelli) #2070
  • [3713038]: Update README.md (Mirco Ravanelli) #1572
  • [2fa5988]: Update README.md (Mirco Ravanelli) #1572
  • [f6f41bf]: Merge branch 'develop' into fastspeech2 (Mirco Ravanelli) #1572
  • [783f674]: Update LJSpeech.csv (Mirco Ravanelli) #1572
  • [3a53940]: Update LJSpeech.csv (Mirco Ravanelli) #1572
  • [f848b69]: final updates (BenoitWang) #1572
  • [cd71bb1]: Merge branch 'fastspeech2' of https://github.com/bloodraven66/speechbrain into fastspeech2 (BenoitWang) #1572
  • [b9d3dc0]: update requirements (Mirco Ravanelli) #1572
  • [daa8aaf]: Merge branch 'fastspeech2' of https://github.com/bloodraven66/speechbrain into fastspeech2 (Mirco Ravanelli) #1572
  • [63cfd22]: Merge branch 'fastspeech2' of https://github.com/bloodraven66/speechbrain into fastspeech2 (BenoitWang) #1572
  • [fbcd142]: Update requirements.txt (Mirco Ravanelli) #1572
  • [7b05521]: Update requirements.txt (Mirco Ravanelli) #1572
  • [c6958b5]: fix interface (BenoitWang) #1572
  • [10e6d6b]: Update requirements.txt (Mirco Ravanelli) #1572
  • [40de2d4]: Merge branch 'fastspeech2' of https://github.com/bloodraven66/speechbrain into fastspeech2 (BenoitWang) #1572
  • [092375e]: fix failing test in HuggingFaceWav2Vec2Pretrain (Mirco Ravanelli) #1572
  • [da8ad01]: Merge branch 'fastspeech2' of https://github.com/bloodraven66/speechbrain into fastspeech2 (Mirco Ravanelli) #1572
  • [be1a43a]: fix examples (BenoitWang) #1572
  • [8bdbda6]: Merge branch 'fastspeech2' of https://github.com/bloodraven66/speechbrain into fastspeech2 (BenoitWang) #1572
  • [167b976]: Backport model changes from streaming branch (Sylvain de Langen) #1782
  • [a2b137c]: Add note for DDP/grad accumulation (asu) #1782
  • [72afffe]: Precommit fix (Sylvain de Langen) #1782
  • [4bad95a]: dvoice (Adel Moumen) #2064
  • [5894630]: cv.csv (Adel Moumen) #2064
  • [2342431]: esv50 (Adel Moumen) #2064
  • [eb0978b]: fisher (Adel Moumen) #2064
  • [6107dce]: fluent-speech-commands (Adel Moumen) #2064
  • [83c013c]: google-speech-commands (Adel Moumen) #2064
  • [a9ee7f5]: iemocap (Adel Moumen) #2064
  • [16dd0b8]: ksponspeech (Adel Moumen) #2064
  • [1973d92]: librimix (Adel Moumen) #2064
  • [f218f36]: libriparty (Adel Moumen) #2064
  • [f989b52]: librispeech (Adel Moumen) #2064
  • [511ff78]: libritts (Adel Moumen) #2064
  • [b15c6ba]: ljspeech (Adel Moumen) #2064
  • [05040a5]: media ; remove links as the folders are empties (Adel Moumen) #2064
  • [e37f45f]: real-m (Adel Moumen) #2064
  • [68899e6]: slurp (Adel Moumen) #2064
  • [128f5eb]: timers (Adel Moumen) #2064
  • [8fb6030]: timit (Adel Moumen) #2064
  • [b598a68]: urbansound8k (Adel Moumen) #2064
  • [7543fc0]: voicebank (Adel Moumen) #2064
  • [49c076e]: voxceleb (Adel Moumen) #2064
  • [5f2c5d2]: voxlingua (Adel Moumen) #2064
  • [cc97c1d]: whamandwhamr (Adel Moumen) #2064
  • [a4ee886]: wsj0mix (Adel Moumen) #2064
  • [6b7900a]: readmes/yaml (Adel Moumen) #2064
  • [78a7722]: media missing folders in gdrive... (Adel Moumen) #2064
  • [1de6767]: media missing folders in gdrive... (Adel Moumen) #2064
  • [ea14841]: lirbispeech (Adel Moumen) #2064
  • [816eb03]: libriparty (Adel Moumen) #2064
  • [e3a0a05]: commonvoice (Adel Moumen) #2064
  • [db54a85]: remove '?usp=sharing' (Adel Moumen) #2064
  • [78dae81]: Merge remote-tracking branch 'origin/develop' into dropbox-links (Adel Moumen) #2064
  • [7b88133]: jlspeech (Adel Moumen) #2064
  • [00e0a63]: conformer/branchformer links librispeech (Adel Moumen) #2064
  • [288690c]: ljspeech path dropbox (Adel Moumen) #2064
  • [5969765]: Updated results + model link (Sylvain de Langen) #1782
  • [c88bd96]: fix url (Mirco Ravanelli) #2064
  • [e12f237]: fix url (Mirco Ravanelli) #2064
  • [e0ed799]: fix url (Mirco Ravanelli) #2064
  • [45c8b66]: fix url (Mirco Ravanelli) #2064
  • [75bddd8]: fix url (Mirco Ravanelli) #2064
  • [9a8297a]: fix url (Mirco Ravanelli) #2064
  • [4337313]: fix url (Mirco Ravanelli) #2064
  • [1edd952]: fix url (Mirco Ravanelli) #2064
  • [e1ec2f9]: fix url (Mirco Ravanelli) #2064
  • [f8c8786]: fix tacotron prepare, update and correct readme (BenoitWang) #2080
  • [5c2b27e]: fix librispeech tests (Mirco Ravanelli) #2081
  • [b064fdf]: adding missing link (Mirco Ravanelli) #2081
  • [57778ba]: fix extra column in tests/recipes/RescueSpeech.csv (Mirco Ravanelli)
  • [71408c3]: make recipe test for sepformer-conformerintra.yaml less memory demanding (Mirco Ravanelli)
  • [3feb9b0]: make recipe test for sepformer-conformerintra.yaml less memory demanding (Mirco Ravanelli)
  • [378e820]: making recipe tests for sepformer-conformerintra.yaml less memory demanding (Mirco Ravanelli) #2081
  • [e708e35]: Merge branch 'develop' into fix_test (Mirco Ravanelli) #2081
  • [c9a87f0]: Update sepformer-conformerintra.yaml (Mirco Ravanelli) #2081
  • [3287c4d]: Update README (Sylvain de Langen) #1782
  • [5b9f48b]: Merge branch 'develop' into libriconformerlarge (asu) #1782
  • [7f39d12]: Fix typo (Sylvain de Langen) #1782
  • [1f3f141]: TIMIT fixes (Adel Moumen) #2081
  • [7706855]: rename to IWSLT22_lowresource (Adel Moumen) #2081
  • [922719a]: fix last TIMIT transducer test (Mirco Ravanelli) #2081
  • [629b1b1]: fix pretrainer -> Fisher-Callhome-Spanish is passing (Adel Moumen) #2081
  • [83c6ac3]: Merge branch 'fixtest' of https://github.com/speechbrain/speechbrain into fixtest (Adel Moumen) #2081
  • [49287c7]: fix KsponSpeech.csv test (Mirco Ravanelli) #2081
  • [3b58685]: Merge branch 'fixtest' of https://github.com/speechbrain/speechbrain into fixtest (Mirco Ravanelli) #2081
  • [3d15104]: fixed SLURP (Mirco Ravanelli) #2081
  • [779c1f1]: Merge branch 'develop' into feature/ksponspeech (Mirco Ravanelli) #1813
  • [93c7d74]: Update README.md (Mirco Ravanelli) #1813
  • [2e2d325]: fix ZED recipe (BenoitWang) #2081
  • [a4e7f5e]: Merge branch 'fixtest' of https://github.com/speechbrain/speechbrain into fixtest (BenoitWang) #2081
  • [50d777a]: fix style issue (Mirco Ravanelli) #2081
  • [f6d6158]: fix style issue (Mirco Ravanelli) #2081
  • [6e111ec]: fix fastspeech2 recipe (BenoitWang) #2081
  • [6693450]: Merge branch 'fixtest' of https://github.com/speechbrain/speechbrain into fixtest (BenoitWang) #2081
  • [366ff5c]: Update README.md (Mirco Ravanelli) #1813
  • [231d907]: Update KsponSpeech.csv (Mirco Ravanelli) #1813
  • [2414703]: Update KsponSpeech.csv (Mirco Ravanelli) #1813
  • [fcdd544]: Update README.md (Mirco Ravanelli) #1813
  • [fbe79c8]: Update README.md (Mirco Ravanelli) #1813
  • [c43d834]: Update KsponSpeech.csv (Mirco Ravanelli) #1813
  • [13d2fb9]: Merge branch 'develop' into fix_test (Mirco Ravanelli) #2081
  • [777bad4]: final fix for the recipe test of KsponSpeech.csv (Mirco Ravanelli) #2081
  • [4554996]: Update KsponSpeech.csv (Mirco Ravanelli) #2081
  • [406c7c8]: Merge branch 'develop' into fix_test (Mirco Ravanelli) #2081
  • [b256f14]: add extra dependency (Mirco Ravanelli) #2081
  • [1c72119]: add use_torchaudio outside to make it possible setting it from command line (Mirco Ravanelli) #2081
  • [40d9628]: fix recipe tests for conformer_transducer.yaml (Mirco Ravanelli) #2081
  • [85809da]: Update extra-dependencies.txt (Mirco Ravanelli) #2081
  • [ae52125]: fix skipping flag (Mirco Ravanelli) #2081
  • [5e0376f]: add tests on both gpu and cpu (Mirco Ravanelli) #2081
  • [bb47b52]: add device info in the output folder of recipe tests (Mirco Ravanelli) #2081
  • [4680357]: simplify binauralwsj data generation trigger (Cem Subakan) #2081
  • [6c5ebf0]: Merge branch 'fixtest' of github.com:speechbrain/speechbrain into fixtest (Cem Subakan) #2081
  • [f1f0ace]: resume previous test folder, as the new one is messing tests up (Mirco Ravanelli) #2081
  • [929dff6]: fixed the test breaking issue (Cem Subakan) #2081
  • [bd66691]: Merge branch 'fixtest' of github.com:speechbrain/speechbrain into fixtest (Cem Subakan) #2081
  • [82fa9dd]: fix extra dep in Aishell1Mix (Mirco Ravanelli) #2081
  • [e96b7d0]: mention extra-requirements in README file (Mirco Ravanelli) #2081
  • [a816359]: mention extra-requirements in README file (Mirco Ravanelli) #2081
  • [6e69011]: mention extra-requirements in README file (Mirco Ravanelli) #2081
  • [aa6e789]: mention extra-requirements in README file (Mirco Ravanelli) #2081
  • [49d7985]: mention extra-requirements in README file (Mirco Ravanelli) #2081
  • [31f003a]: mention extra-requirements in README file (Mirco Ravanelli) #2081
  • [ab8697a]: mention extra-requirements in README file (Mirco Ravanelli) #2081
  • [9aeb7af]: mention extra-requirements in README file (Mirco Ravanelli) #2081
  • [c724b75]: mention extra-requirements in README file (Mirco Ravanelli) #2081
  • [3655e8b]: fix another batch of readme files (Mirco Ravanelli) #2081
  • [072bf33]: fix another batch of readme files (Mirco Ravanelli) #2081
  • [4aa6a4d]: fix another batch of readme files (Mirco Ravanelli) #2081
  • [d77796f]: fix another batch of readme files (Mirco Ravanelli) #2081
  • [1666feb]: fix another batch of readme files (Mirco Ravanelli) #2081
  • [dc1e16c]: Merge branch 'fixtest' of https://github.com/speechbrain/speechbrain into fixtest (Mirco Ravanelli) #2081
  • [f053563]: Ensure MetricGAN recipe works with small dataset (Peter Plantinga) #2081
  • [4389ce3]: Fix inplace error in EnhanceResnet (Peter Plantinga) #2081
  • [ccd7438]: sort requirements (Mirco Ravanelli) #2081
  • [a6eedc5]: Merge branch 'fixtest' of https://github.com/speechbrain/speechbrain into fixtest (Mirco Ravanelli) #2081
  • [21f5c7b]: fix tests (Mirco Ravanelli) #2081
  • [34962ef]: solved the code breaking issue (Cem Subakan) #2081
  • [0e1ea95]: Merge branch 'fixtest' of github.com:speechbrain/speechbrain into fixtest (Cem Subakan) #2081
  • [f2127f4]: harmonize names (Mirco Ravanelli) #2081
  • [c3ea2c8]: add pandas in requirements (as it is used by 8 recipes) (Mirco Ravanelli) #2081
  • [2dcecd7]: fix extra requirements (Mirco Ravanelli) #2081
  • [520d788]: fix extra requirements (Mirco Ravanelli) #2081
  • [7e4ad93]: fix extra requirements (Mirco Ravanelli) #2081
  • [f3d18dd]: fix extra requirements (Mirco Ravanelli) #2081
  • [34acbf9]: fix style (Mirco Ravanelli) #2081
  • [54e9cb9]: Update createwhamrrirs.py (Mirco Ravanelli) #2081
  • [3d7df00]: fixed requirements files (Cem Subakan) #2081
  • [4534da8]: Merge branch 'fixtest' of github.com:speechbrain/speechbrain into fixtest (Cem Subakan) #2081
  • [fb3bc20]: final fix for MetricGan (Mirco Ravanelli) #2081
  • [ff3042d]: last fixes (Mirco Ravanelli)
  • [a605991]: Update README.md (Mirco Ravanelli)
  • [1dd4267]: Update RescueSpeech.csv (Mirco Ravanelli)
  • [b432a34]: small fix in check HF repo test (Mirco Ravanelli)
  • [2fb125d]: fix test (Mirco Ravanelli)
  • [81146d3]: fix doc req (Mirco Ravanelli)
  • [99d1691]: fix conflcit main-dev (Mirco Ravanelli)
  • [d26153b]: fix test (Mirco Ravanelli)
  • [adb34db]: Merge branch 'develop' (Mirco Ravanelli)

- Python
Published by github-actions[bot] over 2 years ago

speechbrain - SpeechBrain v0.5.14

This release is a minor yet important release. It increases significantly the number of features available while fixing quite a lot of small bugs and issues. A summary of the achievements of this release is given below, while a complete detailed list of all the changes can be found at the bottom of this release note.

Notable achievements

  • 22 new contributors, thank you so much, everyone!
  • 31 new recipes (ASR, SLU, AST, AER, Interpretability, SSL).
  • FULL automatic recipe testing.
  • Increased coverage for the continuous integration over the code, URLs, YAML, recipes, and HuggingFace models.
  • New Conformer Large model for ASR.
  • Integration of Whisper for fine-tuning or inference.
  • Full pre-training of wav2vec2 entirely re-implemented AND documented.
  • Low resource Speech Translation with IWSLT.
  • Many other novelties... see below.

What's Changed

  • fix 1522 by @anautsch in https://github.com/speechbrain/speechbrain/pull/1526
  • bug-fix: fixed OPEN_RIR data preparation process conflict. by @xin-w8023 in https://github.com/speechbrain/speechbrain/pull/1536
  • add noise and reverberance version for BinauralWSJ0Mix by @huangzj421 in https://github.com/speechbrain/speechbrain/pull/1502
  • fix distributed namespace by @anautsch in https://github.com/speechbrain/speechbrain/pull/1566
  • feat: use member field instead of hard-code by @xin-w8023 in https://github.com/speechbrain/speechbrain/pull/1567
  • Update logo to new version by @pplantinga in https://github.com/speechbrain/speechbrain/pull/1575
  • IWSLT 2022 speech translation recipe by @mzboito in https://github.com/speechbrain/speechbrain/pull/1475
  • Fix Issue #1277 timit recipe missing uppercase option by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1564
  • Update README.md by @qanastek in https://github.com/speechbrain/speechbrain/pull/1577
  • Output hiddens states from all the transformer layers of huggingface_wav2vec by @BenoitWang in https://github.com/speechbrain/speechbrain/pull/1570
  • Fix bugs of updatelearningrate by @wangxin22 in https://github.com/speechbrain/speechbrain/pull/1578
  • Fix to use output of unsqueeze() in Tacotron2 parsedecoderoutputs() by @jqug in https://github.com/speechbrain/speechbrain/pull/1525
  • wav2vec2 pretraining implemented with speechbrain by @RuABraun in https://github.com/speechbrain/speechbrain/pull/1312
  • In filterctcoutput(), remove redundant filtering by @olvb in https://github.com/speechbrain/speechbrain/pull/1584
  • Fixed outputallhiddens for hubert in huggingface_wav2vec by @gorinars in https://github.com/speechbrain/speechbrain/pull/1587
  • Fix return value of batch_evaluation for separation recipes by @z-wony in https://github.com/speechbrain/speechbrain/pull/1555
  • fix endless doctest despite no example by @anautsch in https://github.com/speechbrain/speechbrain/pull/1591
  • Fix documented min python version to 3.7 by @AsuMagic in https://github.com/speechbrain/speechbrain/pull/1595
  • Conformer separation by @ycemsubakan in https://github.com/speechbrain/speechbrain/pull/1519
  • Add CTC recipe to AISHELL-1 by @BenoitWang in https://github.com/speechbrain/speechbrain/pull/1576
  • Add templates for issues by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1588
  • Added workaround for CyclicLR saving by @Gastron in https://github.com/speechbrain/speechbrain/pull/1683
  • scikit-learn import and comment fix by @underdogliu in https://github.com/speechbrain/speechbrain/pull/1485
  • Adding recipe for HiFiGAN training using LibriTTS dataset by @pradnya-git-dev in https://github.com/speechbrain/speechbrain/pull/1621
  • fix LibriSpeech CTC pretrainer by @BenoitWang in https://github.com/speechbrain/speechbrain/pull/1594
  • wav2vec German model added by @sangeet2020 in https://github.com/speechbrain/speechbrain/pull/1557
  • issue 1615 typo fix by @sharmadhiraj86 in https://github.com/speechbrain/speechbrain/pull/1700
  • typo in TransformerASR.py by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1704
  • Causality in Conv2d by @fpaissan in https://github.com/speechbrain/speechbrain/pull/1608
  • Switchboard Recipe by @dwgnr in https://github.com/speechbrain/speechbrain/pull/1460
  • read_audio fixes and docs cleanup by @AsuMagic in https://github.com/speechbrain/speechbrain/pull/1592
  • Fix path flake8 in pre-commit by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1721
  • Added german_cleaners by @padmalcom in https://github.com/speechbrain/speechbrain/pull/1642
  • fixing issue 1707 by @TParcollet in https://github.com/speechbrain/speechbrain/pull/1728
  • explicit fetch args & download-only option by @anautsch in https://github.com/speechbrain/speechbrain/pull/1735
  • fix sorting bug by @anautsch in https://github.com/speechbrain/speechbrain/pull/1730
  • remove discussions references by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1737
  • Fix torchaudio mel_normalized for Tacotron2&HifiGAN by @BenoitWang in https://github.com/speechbrain/speechbrain/pull/1740
  • Whisper finetuning by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1717
  • loss must be avg when BS>1 when calling evaluate_batch() by @sangeet2020 in https://github.com/speechbrain/speechbrain/pull/1744
  • [FIX] Flush gradients and save memory for validation. by @MartinKocour in https://github.com/speechbrain/speechbrain/pull/1739
  • add coloring in tqdm progress bar by @sangeet2020 in https://github.com/speechbrain/speechbrain/pull/1573
  • Fix librispeech transformer recipe by @TParcollet in https://github.com/speechbrain/speechbrain/pull/1775
  • 🖍️ improving type-hints in speechbrain/pretrained/interfaces.py by @jonasvdd in https://github.com/speechbrain/speechbrain/pull/1725
  • Enabling the retrieval of whisper's hidden states by @Hguimaraes in https://github.com/speechbrain/speechbrain/pull/1751
  • Added fix to use DDP with hifi_gan training on ljspeech by @padmalcom in https://github.com/speechbrain/speechbrain/pull/1781
  • Fix wav2vec2 masking by @TParcollet in https://github.com/speechbrain/speechbrain/pull/1799
  • fix #1794 by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1805
  • refactor: recipe testing CSVs by @anautsch in https://github.com/speechbrain/speechbrain/pull/1600
  • fix 1788 by @BenoitWang in https://github.com/speechbrain/speechbrain/pull/1842
  • fix docstring for pooling by @BenoitWang in https://github.com/speechbrain/speechbrain/pull/1843
  • Whisper finetunng common voice by @poonehmousavi in https://github.com/speechbrain/speechbrain/pull/1809
  • fixing the convtasnet causal=True bug by @ycemsubakan in https://github.com/speechbrain/speechbrain/pull/1851
  • Fix Whisper doc + improve maxdecoderatio by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1858
  • Rewrite multi-GPU documentation by @AsuMagic in https://github.com/speechbrain/speechbrain/pull/1861
  • SLU Media recipe by @GaelleLaperriere in https://github.com/speechbrain/speechbrain/pull/1172
  • edits for refactoring check tool by @anautsch in https://github.com/speechbrain/speechbrain/pull/1838
  • minor fixes for recipe testing by @anautsch in https://github.com/speechbrain/speechbrain/pull/1872
  • Fix Whisper avoid_if_longer_than never used by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1882
  • Starting a recipe for ESC50 by @ycemsubakan in https://github.com/speechbrain/speechbrain/pull/1605
  • Fix for #1886 by @anautsch in https://github.com/speechbrain/speechbrain/pull/1890
  • fix batchtoright by @anthony-wss in https://github.com/speechbrain/speechbrain/pull/1884
  • Fixes for pre-release testing by @anautsch in https://github.com/speechbrain/speechbrain/pull/1895
  • Fix Conformer Instabilities and add Large Model by @TParcollet in https://github.com/speechbrain/speechbrain/pull/1892
  • Downsampling by @salah-zaiem in https://github.com/speechbrain/speechbrain/pull/1888
  • fix core.py bf16 by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1898
  • S2SGreedySearcher : Do not continue decoding when EOS token was generated for all samples from a batch by @Jeronymous in https://github.com/speechbrain/speechbrain/pull/1899
  • quick fixes before minor by @anautsch in https://github.com/speechbrain/speechbrain/pull/1896

New Contributors

  • @xin-w8023 made their first contribution in https://github.com/speechbrain/speechbrain/pull/1536
  • @mzboito made their first contribution in https://github.com/speechbrain/speechbrain/pull/1475
  • @qanastek made their first contribution in https://github.com/speechbrain/speechbrain/pull/1577
  • @wangxin22 made their first contribution in https://github.com/speechbrain/speechbrain/pull/1578
  • @jqug made their first contribution in https://github.com/speechbrain/speechbrain/pull/1525
  • @olvb made their first contribution in https://github.com/speechbrain/speechbrain/pull/1584
  • @gorinars made their first contribution in https://github.com/speechbrain/speechbrain/pull/1587
  • @z-wony made their first contribution in https://github.com/speechbrain/speechbrain/pull/1555
  • @AsuMagic made their first contribution in https://github.com/speechbrain/speechbrain/pull/1595
  • @pradnya-git-dev made their first contribution in https://github.com/speechbrain/speechbrain/pull/1621
  • @sangeet2020 made their first contribution in https://github.com/speechbrain/speechbrain/pull/1557
  • @sharmadhiraj86 made their first contribution in https://github.com/speechbrain/speechbrain/pull/1700
  • @fpaissan made their first contribution in https://github.com/speechbrain/speechbrain/pull/1608
  • @dwgnr made their first contribution in https://github.com/speechbrain/speechbrain/pull/1460
  • @padmalcom made their first contribution in https://github.com/speechbrain/speechbrain/pull/1642
  • @MartinKocour made their first contribution in https://github.com/speechbrain/speechbrain/pull/1739
  • @jonasvdd made their first contribution in https://github.com/speechbrain/speechbrain/pull/1725
  • @poonehmousavi made their first contribution in https://github.com/speechbrain/speechbrain/pull/1809
  • @GaelleLaperriere made their first contribution in https://github.com/speechbrain/speechbrain/pull/1172
  • @anthony-wss made their first contribution in https://github.com/speechbrain/speechbrain/pull/1884
  • @salah-zaiem made their first contribution in https://github.com/speechbrain/speechbrain/pull/1888
  • @Jeronymous made their first contribution in https://github.com/speechbrain/speechbrain/pull/1899

Full Changelog: https://github.com/speechbrain/speechbrain/compare/v0.5.13...v0.5.14

- Python
Published by TParcollet almost 3 years ago

speechbrain - SpeechBrain v0.5.13

This is a minor release with better dependency version specification. We note that SpeechBrain is compatible with PyTorch 1.12, and the updated package reflects this. See the issue linked next to each commit for more details about the corresponding changes.

Commit summary

  • [edb7714]: Adding nosync and onfitbatchend method to core (Rudolf Arseni Braun) #1449
  • [07155e9]: G2P fixes (flexthink) #1473
  • [6602dab]: fix for #1469, minimal testing for profiling (anautsch) #1476
  • [abbfab9]: test clean-ups: passes linters; doctests; unit & integration tests; load-yaml on cpu (anautsch) #1487
  • [1a16b41]: fix ddp incorrect command (=) #1498
  • [0b0ec9d]: using nosync() in fitbatch() of core.py (Rudolf Arseni Braun) #1449
  • [5c9b833]: Remove torch maximum compatible version (Peter Plantinga) #1504
  • [d0f4352]: remove limit for HF hub as it does not work with colab (Titouan) #1508
  • [b78f6f8]: Add revision to hub (Titouan) #1510
  • [2c491a4]: fix transducer loss inputs devices (Adel Moumen) #1511
  • [4972f76]: missing space in install command (pehonnet) #1512
  • [6bc72af]: Fixing shuffle argument for distributed sampler in core.py (Rudolf Arseni Braun) #1518
  • [df7acd9]: Added the link for example results (cem) #1523
  • [5bae6df]: add LinearWarmupScheduler (Ge Li) #1537
  • [2edd7ee]: updating scipy version in requirements.txt. (Nauman Dawalatabad) #1546

- Python
Published by github-actions[bot] over 3 years ago

speechbrain - SpeechBrain v0.5.12

Release Notes - SpeechBrain v0.5.12

We worked very hard and we are very happy to announce the new version of SpeechBrain!

SpeechBrain 0.5.12 significantly expands the toolkit without introducing any major interface changes. I would like to warmly thank the many contributors that made this possible.

The main changes are the following:

A) Text-to-Speech: We developed the first TTS system of SpeechBrain. You can find it here. The system relies on Tacotron2 + HiFiGAN (as vocoder). The models coupled with an easy-inference interface are available on HuggingFace.

B) Grapheme-to-Phoneme (G2P): We developed an advanced Grapheme-to-Phoneme. You can find the code here. The current version significantly outperforms our previous model.

C) Speech Separation: 1. We developed a novel version of the SepFormer called Resource-Efficient SepFormer (RE-Sepformer). The code is available here and the pre-trained model (with an easy inference interface) here. 2. We released a recipe for Binaural speech separation with WSJMix. See the code here. 3. We released a new recipe with the AIShell mix dataset. You can see the code here.

D) Speech Enhancement: 1. We released the SepFormer model for speech enhancement. the code is here, while the pre-trained model (with easy-inference interface) is here. 2. We implemented the WideResNet for speech enhancement and use it to mimic loss-based speech enhancement. The code is here and the pretrained model (with easy-inference interface) is here.

E) Feature Front-ends: 1. We now support LEAF filter banks. The code is here. You can find an example of a recipe using it here. 2. We now support SincConv multichannel (see code here).

F) Recipe Refactors: 1. We refactored the Voxceleb recipe and fix the normalization issues. See the new code here. We also made the EER computation method less memory demanding (see here). 2. We refactored the IEMOCAP recipe for emotion recognition. See the new code here.

G) Models for African Languages: We now have recipes for the DVoice dataset. We currently support Darija, Swahili, Wolof, Fongbe, and Amharic. The code is available here. The pretrained model (coupled with an easy-inference interface) can be found on SpeechBrain-HuggingFace.

H) Profiler: We implemented a model profiler that helps users while developing new models with SpeechBrain. The profiler outputs a bunch of potentially useful information, such as the real-time factors and many other details. A tutorial is available here.

I) Tests: We significantly improved the tests. In particular, we introduced the following tests: HF_repo tests, docstring checks, yaml-script consistency, recipe tests, and check URLs. This will helps us scale up the project.

L) Other improvements: 1. We now support the torchaudio RNNT loss. 2. We improved the relative attention mechanism of the *Conformer. 3. We updated the transformer for LibriSpeech. This improves the performance from WER= 2.46% to 2.26% on the test-clean. See the code here. 4. The Environmental corruption module can now support different sampling rates. 5. Minor fixes.

- Python
Published by mravanelli over 3 years ago

speechbrain - SpeechBrain v0.5.11

Dear users, We worked very hard, and we are very happy to announce the new version of SpeechBrain. SpeechBrain 0.5.11 further expands the toolkit without introducing any major interface change.

The main changes are the following: 1. We implemented new recipes, such as: - VoxLingua 107 for language identification. - Sepformer for speech enhancement - MetricGAN-U for speech enhancement - SLURP with wav2vec for spoken language understanding. - REALM for speech separation with real data. - Korean Speech Recognition with KsponSpeech. - CommonVoice for German. - IEMOCAP for language emotion recognition using wav2vec.

  1. Support for Dynamic batching with a Tutorial to help users familiarize themselves with it.

  2. Support for wav2vec training within SpeechBrain.

  3. Developed an interface with Orion for hyperparameter tuning with a Tutorial to help users familiarize themselves with it.

  4. the torchaudio transducer loss is now supported. We also kept our numba implementation to help users customize the transducer loss part if needed.

  5. Improved CTC-Segmentation

  6. Fixed minor bugs and issues (e.g., fixed MVDR beamformer ).

Let me thank all the amazing contributors for this achievement. Please, keep add a star to our project if you appreciate our effort for the community. Together, we are growing very fast, and we have big plans for the future.

Stay Tuned!

- Python
Published by github-actions[bot] about 4 years ago

speechbrain - SpeechBrain v0.5.10

This version mainly expands the functionalities of SpeechBrain without adding any backward incompatibilities.

New Recipes:

  • Language Identification with CommonLanguage
  • EEG signal processing with ERPCore
  • Speech translation with Fisher-Call Home
  • Emotion Recognition with IEMOCAP
  • Voice Activity Detection with LibriParty
  • ASR with LibriSpeech wav2vec (WER=1.9 on test-clean)
  • SpeechEnhancement with CoopNet
  • SpeechEnhancement with SEGAN
  • Speech Separation with LibriMix, WHAM, and WHAMR
  • Support for guided attention
  • Spoken Language Understanding with SLURP

Beyond that, we fixed some minor bugs and issues.

- Python
Published by mravanelli over 4 years ago

speechbrain - v0.5.9

This main differences with the previous version are the following:

  • Added Wham/whamr/librimix for speech separation
  • Compatibility with PyTorch 1.9
  • Fixed minor bugs
  • Added SpeechBrain paper

- Python
Published by github-actions[bot] over 4 years ago

speechbrain - v0.5.8

SpeechBrain 0.5.8 improves the previous version in the following way:

  • Added wav2vec support in TIMIT, CommonVoice, AISHELL-1
  • Improved Fluent Speech Command Recipe
  • Improved SLU recipes
  • Recipe for UrbanSound8k
  • Fix small bugs
  • Fix typos

- Python
Published by github-actions[bot] over 4 years ago

speechbrain - SpeechBrain v0.5.7

SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains. The current version (v0.5.7) supports: - E2E Speech Recognition - Speaker Recognition (Identification and Verification) - Spoken Language Understanding (e.g., Intent recognition) - Speaker Diarization - Speech Enhancement - Speech Separation - Multi-microphone signal processing (beamforming, localization)

Many other tasks will be supported soon. Take a look into our roadmap on Discourse. Your contribution is welcome! Please, star our project to help us growing.

For more info and tutorials: https://speechbrain.github.io/

- Python
Published by mravanelli almost 5 years ago