stt - Coqui STT 1.4.0

General

This is the 1.4.0 release for Coqui STT, the deep learning toolkit for speech-to-text. In accordance with semantic versioning, this version is backwards compatible with previous 1.x versions. The compatibility guarantees of our semantic versioning cover the deployment APIs: the C API and all the official language bindings: Python, Node.JS/ElectronJS and Java/Android. You can get started with Coqui STT 1.4.0 by following the steps in our documentation.

Compatible pre-trained models are available in the Coqui Model Zoo.

We also include example audio files:

audio-1.4.0.tar.gz

which can be used to test the engine, and checkpoint files for the English model (which are identical to the 1.0.0 checkpoint and provided here for convenience purposes):

coqui-stt-1.4.0-checkpoint.tar.gz

which are under the Apache 2.0 license and can be used as the basis for further fine-tuning. Finally this release also includes a source code tarball:

v1.4.0.tar.gz

Under the MPL-2.0 license. Note that this tarball is for archival purposes only since GitHub does not include submodules in the automatic tarballs. For usage and development with the source code, clone the repository using Git, following our documentation.

Notable changes

Added experimental WebAssembly support

With the new WASM package you can deploy Coqui STT directly in the browser.
Added ARMv7 and AArch64 Python wheels for Python 3.7 and 3.9
Migrated .NET bindings to .NET Framework 4.8
Rewritten audio processing logic in iOS demo app

Documentation

Documentation is available on stt.readthedocs.io.

Contact/Getting Help

GitHub Discussions - best place to ask questions, get support, and discuss anything related to 🐸STT with other users.
Gitter - You can also join our Gitter chat.
Issues - If you have discussed a problem and identified a bug in 🐸STT, or if you have a feature request, please open an issue in our repo. Please make sure you search for an already existing issue beforehand!

Contributors to 1.4.0 release

Alessio Placitelli
Anton Yaroshenko
ChamathKB
Ciaran O'Reilly
Daniel Souza
Danny Waser
David Roundy
Davidian1024
Edresson Casanova
Josh Meyer
Mariano Gonzalez
NanoNabla
Reuben Morais
Yanlong Wang

We’d also like to thank all the members of our Gitter chat room who have been helping to shape this release!

- C++
Published by github-actions[bot] over 3 years ago

stt - Coqui STT 1.4.0-alpha.6

- C++
Published by github-actions[bot] over 3 years ago

stt - Coqui STT 1.4.0-alpha.5

- C++
Published by github-actions[bot] over 3 years ago

stt - Coqui STT 1.4.0-alpha.4

- C++
Published by github-actions[bot] over 3 years ago

stt - Coqui STT 1.4.0-alpha.3

- C++
Published by github-actions[bot] over 3 years ago

stt - Coqui STT 1.4.0-alpha.2

- C++
Published by github-actions[bot] over 3 years ago

stt - Coqui STT 1.4.0-alpha.1

- C++
Published by github-actions[bot] almost 4 years ago

stt - Coqui STT 1.4.0-alpha.0

- C++
Published by github-actions[bot] almost 4 years ago

stt - Coqui STT 1.3.0

General

This is the 1.3.0 release for Coqui STT, the deep learning toolkit for speech-to-text. In accordance with semantic versioning, this version is backwards compatible with previous 1.x versions. The compatibility guarantees of our semantic versioning cover the deployment APIs: the C API and all the official language bindings: Python, Node.JS/ElectronJS and Java/Android. You can get started today with Coqui STT 1.3.0 by following the steps in our documentation.

Compatible pre-trained models are available in the Coqui Model Zoo.

We also include example audio files:

audio-1.3.0.tar.gz

which can be used to test the engine, and checkpoint files for the English model (which are identical to the 1.0.0 checkpoint and provided here for convenience purposes):

coqui-stt-1.3.0-checkpoint.tar.gz

which are under the Apache 2.0 license and can be used as the basis for further fine-tuning. Finally this release also includes a source code tarball:

v1.3.0.tar.gz

Under the MPL-2.0 license. Note that this tarball is for archival purposes only since GitHub does not include submodules in the automatic tarballs. For usage and development with the source code, clone the repository using Git, following our documentation.

Notable changes

Added new experimental APIs for loading Coqui STT models from memory buffers

This allows loading models without writing them to disk first, which can be useful for dynamic model loading as well as for handling packaging in mobile platforms
Added ElectronJS 16 support
Rewritten audio processing logic in iOS demo app
Added pre-built binaries for iOS/Swift bindings in CI

With these two changes we're hoping to get more feedback from iOS developers on our Swift bindings and pre-built STT frameworks - how can we best package and distribute the bindings so that it feels native to Swift/iOS developers? If you have any feedback, join our Gitter room!
Extended the Multilingual LibriSpeech importer to support all languages in the dataset

Supported languages: English, German, Dutch, French, Spanish, Italian, Portuguese, Polish
Exposed full metadata information for decoded samples when using the coquisttctcdecoder Python package

This allows access to the entire information returned by the decoder in training code, meaning experimenting with new model architectures doesn't require adapting the C++ inference library to test your changes.
Added initial support for Apple Silicon in our pre-built binaries

C/C++ pre-built libraries are universal, language bindings will be updated soon
Added support for FLAC files in training

Documentation

Documentation is available on stt.readthedocs.io.

Contact/Getting Help

GitHub Discussions - best place to ask questions, get support, and discuss anything related to 🐸STT with other users.
Gitter - You can also join our Gitter chat.
Issues - If you have discussed a problem and identified a bug in 🐸STT, or if you have a feature request, please open an issue in our repo. Please make sure you search for an already existing issue beforehand!

Contributors to 1.3.0 release

Alessio Placitelli
Danny Waser
Erik Ziegler
Han Xiao
Reuben Morais

We’d also like to thank all the members of our Gitter chat room who have been helping to shape this release!

- C++
Published by github-actions[bot] almost 4 years ago

stt - Coqui STT 1.3.0-alpha.4

- C++
Published by github-actions[bot] almost 4 years ago

stt - Coqui STT 1.3.0-alpha.3

- C++
Published by github-actions[bot] almost 4 years ago

stt - Coqui STT 1.3.0-alpha.2

- C++
Published by github-actions[bot] almost 4 years ago

stt - Coqui STT 1.3.0-alpha.1

- C++
Published by github-actions[bot] almost 4 years ago

stt - Coqui STT 1.3.0-alpha.0

- C++
Published by github-actions[bot] almost 4 years ago

stt - Coqui STT 1.2.0

General

This is the 1.2.0 release for Coqui STT, the deep learning toolkit for speech-to-text. In accordance with semantic versioning, this version is backwards compatible with previous 1.x versions. The compatibility guarantees of our semantic versioning cover the deployment APIs: the C API and all the official language bindings: Python, Node.JS/ElectronJS and Java/Android. You can get started today with Coqui STT 1.2.0 by following the steps in our documentation.

Compatible pre-trained models are available in the Coqui Model Zoo.

We also include example audio files:

audio-1.2.0.tar.gz

which can be used to test the engine, and checkpoint files for the English model (which are identical to the 1.0.0 checkpoint and provided here for convenience purposes):

coqui-stt-1.2.0-checkpoint.tar.gz

which are under the Apache 2.0 license and can be used as the basis for further fine-tuning. Finally this release also includes a source code tarball:

v1.2.0.tar.gz

Under the MPL-2.0 license. Note that this tarball is for archival purposes only since GitHub does not include submodules in the automatic tarballs. For usage and development with the source code, clone the repository using Git, following our documentation.

Notable changes

Added Python 3.10 support
Added new inference APIs which process any pending data before returning transcription results
Added an importer for using data from Common Voice's new personal data downloader, and a Jupyter notebook which creates a custom STT model using your data
Improved and extend evaluatetflite script (now evaluateexport module) with Opus support
Added support for Ogg/Vorbis encoded audio files as training inputs
Added an importer for the Att-HACK dataset
Model dimensions are now loaded automatically from a checkpoint if present
Checkpoint loader will now handle CuDNN checkpoints transparently, without an explicit flag
When starting a training run, a batch size check will be performed automatically to help diagnose memory issues early
Added support for using WebDataset for training datasets
Updated to TensorFlow Lite 2.8, including new XNNPACK optimizations for quantized models

Documentation

Documentation is available on stt.readthedocs.io.

Contact/Getting Help

GitHub Discussions - best place to ask questions, get support, and discuss anything related to 🐸STT with other users.
Gitter - You can also join our Gitter chat.
Issues - If you have discussed a problem and identified a bug in 🐸STT, or if you have a feature request, please open an issue in our repo. Please make sure you search for an already existing issue beforehand!

Contributors to 1.2.0 release

Alexandre Lissy
Aya AlJafari
Danny Waser
Jeremiah Rose
Jonathan Washington
Josh Meyer
Reuben Morais
Vincent Fretin

We’d also like to thank all the members of our Gitter chat room who have been helping to shape this release!

- C++
Published by github-actions[bot] about 4 years ago

stt - Coqui STT 1.1.0

General

This is the 1.1.0 release for Coqui STT, the deep learning toolkit for speech-to-text. In accordance with semantic versioning, this version is not completely backwards compatible with previous versions. The compatibility guarantees of our semantic versioning cover the deployment APIs: the C API and all the official language bindings: Python, Node.JS/ElectronJS and Java/Android. You can get started today with Coqui STT 1.1.0 by following the steps in our documentation.

Compatible pre-trained models are available in the Coqui Model Zoo.

We also include example audio files:

audio-1.1.0.tar.gz

which can be used to test the engine, and checkpoint files for the English model (which are identical to the 1.0.0 checkpoint and provided here for convenience purposes):

coqui-stt-1.1.0-checkpoint.tar.gz

which are under the Apache 2.0 license and can be used as the basis for further fine-tuning. Finally this release also includes a source code tarball:

v1.1.0.tar.gz

Under the MPL-2.0 license. Note that this tarball is for archival purposes only since GitHub does not include submodules in the automatic tarballs. For usage and development with the source code, clone the repository using Git, following our documentation.

Notable changes

Package missing dependencies with Android AAR packages
Fix evaluate_tflite.py script to use new Coqpit-based config handling
Use export beam width by default in evaluation reports
Integrate lexicon-constrained and lexicon-free Flashlight decoders for CTC and ASG acoustic models in decoder package
Update supported NodeJS versions to current supported releases: 12, 14, and 16
Update supported ElectronJS versions to current supported releases: 12, 13, 14 and 15
Improved and packaged VAD transcription module in the training package (coquistttraining.transcribe)

Documentation

Documentation is available on stt.readthedocs.io.

Contact/Getting Help

GitHub Discussions - best place to ask questions, get support, and discuss anything related to 🐸STT with other users.
Gitter - You can also join our Gitter chat.
Issues - If you have discussed a problem and identified a bug in 🐸STT, or if you have a feature request, please open an issue in our repo. Please make sure you search for an already existing issue beforehand!

Contributors to 1.1.0 release

Alexandre Lissy
Josh Meyer
Julian Darley
Leon Kiefer
Reuben Morais
Vojtěch Drábek

We’d also like to thank all the members of our Gitter chat room who have been helping to shape this release!

- C++
Published by github-actions[bot] about 4 years ago

stt - Coqui STT 1.1.0-alpha.1

- C++
Published by github-actions[bot] over 4 years ago

stt - Coqui STT 1.1.0-alpha.0

- C++
Published by github-actions[bot] over 4 years ago

stt - Coqui STT 1.0.0

General

This is the 1.0.0 release for Coqui STT, the deep learning toolkit for speech-to-text. In accordance with semantic versioning, this version is not completely backwards compatible with previous versions. The compatibility guarantees of our semantic versioning cover the inference APIs: the C API and all the official language bindings: Python, Node.JS/ElectronJS and Android. You can get started today with Coqui STT 1.0.0 by following the steps in our documentation.

This release includes pre-trained English models, available in the Coqui Model Zoo:

all under the Apache 2.0 license.

The acoustic models were trained on American English data with synthetic noise augmentation. The model achieves a 4.5% word error rate on the LibriSpeech clean test corpus and 13.6% word error rate on the LibriSpeech other test corpus with the largest release language model.

Note that the model currently performs best in low-noise environments with clear recordings. This does not mean the model cannot be used outside of these conditions, but that accuracy may be lower. Some users may need to further fine tune the model to meet their intended use-case.

We also include example audio files:

audio-1.0.0.tar.gz

which can be used to test the engine, and checkpoint files for the English model:

coqui-stt-1.0.0-checkpoint.tar.gz

which are under the Apache 2.0 license and can be used as the basis for further fine-tuning. Finally this release also includes a source code tarball:

v1.0.0.tar.gz

Under the MPL-2.0 license. Note that this tarball is for archival purposes only since GitHub does not include submodules in the automatic tarballs. For usage and development with the source code, clone the repository using Git, following our documentation.

Notable changes

Removed support for protocol buffer input in native client and consolidated all packages under a single "STT" name accepting TFLite inputs
Added programmatic interface to training code and example Jupyter Notebooks, including how to train with Common Voice data
Added transparent handling of mixed sample rates and stereo audio in training inputs
Moved CI setup to GitHub Actions, making code contributions easier to test
Added configuration management via Coqpit, providing a more flexible config interface that's compatible with Coqui TTS
Handle Opus audio files transparently in training inputs
Added support for automatic dataset subset splitting
Added support for automatic alphabet generation and loading
Started publishing the training code CI for a faster notebook setup
Refactor training code into self-contained modules and deprecate train.py as universal entry point for training

Training Regimen + Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the training regimen, hardware used (a server with 8 NVIDIA A100 GPUs each with 40GB of VRAM), along with the full training hyperparameters. The full training configuration in JSON format is available here.

The datasets used were: - Common Voice 7.0 (with custom train/dev/test splits) - Multilingual LibriSpeech (English, Opus) - LibriSpeech

The optimal lm_alpha and lm_beta values with respect to the Common Voice 7.0 (custom Coqui splits) and a large vocabulary language model:

lm_alpha: 0.5891777425167632
lm_beta: 0.6619145283338659

Documentation

Documentation is available on stt.readthedocs.io.

Contact/Getting Help

GitHub Discussions - best place to ask questions, get support, and discuss anything related to 🐸STT with other users.
Gitter - You can also join our Gitter chat.
Issues - If you have discussed a problem and identified a bug in 🐸STT, or if you have a feature request, please open an issue in our repo. Please make sure you search for an already existing issue beforehand!

Contributors to 1.0.0 release

Alexandre Lissy
Anon-Artist
Anton Yaroshenko
Catalin Voss
dag7dev
Dustin Zubke
Eren Gölge
Erik Ziegler
Francis Tyers
Ideefixze
Ilnar Salimzianov
imrahul3610
Jeremiah Rose
Josh Meyer
Kathy Reid
Kelly Davis
Kenneth Heafield
NanoNabla
Neil Stoker
Reuben Morais
zaptrem

We’d also like to thank all the members of our Gitter chat room who have been helping to shape this release!

- C++
Published by github-actions[bot] over 4 years ago

stt - Coqui STT 0.10.0-alpha.29

Test automatic release notes.

- C++
Published by github-actions[bot] over 4 years ago

stt - Coqui STT 0.10.0-alpha.28

- C++
Published by github-actions[bot] over 4 years ago

stt - Coqui STT v0.10.0-alpha.26

- C++
Published by github-actions[bot] over 4 years ago

stt - Coqui STT v0.10.0-alpha.25

- C++
Published by github-actions[bot] over 4 years ago

stt - Coqui STT v0.10.0-alpha.24

- C++
Published by github-actions[bot] over 4 years ago

stt - Coqui STT v0.10.0-alpha.23

- C++
Published by github-actions[bot] over 4 years ago

stt - Coqui STT v0.10.0-alpha.22

- C++
Published by github-actions[bot] over 4 years ago

stt - Coqui STT v0.10.0-alpha.21

- C++
Published by github-actions[bot] over 4 years ago

stt - Coqui STT $tag

- C++
Published by github-actions[bot] over 4 years ago

stt - Coqui STT v0.10.0-alpha.14

- C++
Published by reuben over 4 years ago

stt - STT v0.10.0-alpha.7

Alpha release, for development purposes only.

- C++
Published by reuben over 4 years ago

stt -

- C++
Published by reuben almost 5 years ago

stt - Coqui STT 0.9.3

General

This is an initial release for 🐸STT, backwards compatible with mozilla/DeepSpeech 0.9.3. The model files below are identical with to the 0.9.3 release of mozilla/DeepSpeech, and released under the MPL 2.0 license accordingly. These models are provided as a compatibility aid so that examples in our documentation can work with the existing release links.

This release includes the source code:

v0.9.3.tar.gz

Under the MPL-2.0 license. And the acoustic models:

coqui-stt-0.9.3-models.pbmm coqui-stt-0.9.3-models.tflite

Experimental Mandarin Chinese acoustic models trained on an internal corpus composed of 2000h of read speech:

coqui-stt-0.9.3-models-zh-CN.pbmm coqui-stt-0.9.3-models-zh-CN.tflite

all under the MPL-2.0 license.

The model files with the ".pbmm" extension are memory mapped and thus memory efficient and fast to load. The model files with the ".tflite" extension are converted to use TensorFlow Lite, has post-training quantization enabled, and are more suitable for resource constrained environments.

The acoustic models were trained on American English with synthetic noise augmentation and the .pbmm model achieves an 7.06% word error rate on the LibriSpeech clean test corpus.

Note that the model currently performs best in low-noise environments with clear recordings and has a bias towards US male accents. This does not mean the model cannot be used outside of these conditions, but that accuracy may be lower. Some users may need to train the model further to meet their intended use-case.

In addition we release the scorer:

coqui-stt-0.9.3-models.scorer

which takes the place of the language model and trie in older releases and which is also under the MPL-2.0 license.

There is also a corresponding scorer for the Mandarin Chinese model:

coqui-stt-0.9.3-models-zh-CN.scorer

We also include example audio files:

audio-0.9.3.tar.gz

which can be used to test the engine, and checkpoint files for both the English and Mandarin models:

coqui-stt-0.9.3-checkpoint.tar.gz coqui-stt-0.9.3-checkpoint-zh-CN.tar.gz

which are under the MPL-2.0 license and can be used as the basis for further fine-tuning.

Training Regimen + Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the training regimen, hardware used (a server with 8 Quadro RTX 6000 GPUs each with 24GB of VRAM), and our use of cuDNN RNN.

In contrast to some previous releases, training for this release occurred as a fine tuning of the previous 0.8.2 checkpoint, with data augmentation options enabled. The following hyperparameters were used for the fine tuning. See the 0.8.2 release notes for the hyperparameters used for the base model.

train_files Fisher, LibriSpeech, Switchboard, Common Voice English, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.
dev_files LibriSpeech clean dev corpus.
test_files LibriSpeech clean test corpus
train_batch_size 128
dev_batch_size 128
test_batch_size 128
n_hidden 2048
learning_rate 0.0001
dropout_rate 0.40
epochs 200
augment pitch[pitch=1~0.1]
augment tempo[factor=1~0.1]
augment overlay[p=0.9,source=${noise},layers=1,snr=12~4] (where ${noise} is a dataset of Freesound.org background noise recordings)
augment overlay[p=0.1,source=${voices},layers=10~2,snr=12~4] (where ${voices} is a dataset of audiobook snippets extracted from Librivox)
augment resample[p=0.2,rate=12000~4000]
augment codec[p=0.2,bitrate=32000~16000]
augment reverb[p=0.2,decay=0.7~0.15,delay=10~8]
augment volume[p=0.2,dbfs=-10~10]
cache_for_epochs 10

The weights with the best validation loss were selected at the end of 200 epochs using --noearly_stop.

The optimal lm_alpha and lm_beta values with respect to the LibriSpeech clean dev corpus remain unchanged from the previous release:

lm_alpha 0.931289039105002
lm_beta 1.1834137581510284

For the Mandarin Chinese model, the following values are recommended:

lm_alpha 0.6940122363709647
lm_beta 4.777924224113021

Documentation

Documentation is available on stt.readthedocs.io.

Contact/Getting Help

GitHub Discussions - best place to ask questions, get support, and discuss anything related to 🐸STT with other users.
Gitter - You can also join our Gitter chat.
Issues - If you have discussed a problem and identified a bug in 🐸STT, or if you have a feature request, please open an issue in our repo. Please make sure you search for an already existing issue beforehand!

Contributors to 0.9.3 release

Everyone who helped us get this far! Thank you for your continued collaboration!

- C++
Published by reuben almost 5 years ago

Recent Releases of stt

stt - Coqui STT 1.4.0

General

Notable changes

Documentation

Contact/Getting Help

Contributors to 1.4.0 release

stt - Coqui STT 1.4.0-alpha.6

stt - Coqui STT 1.4.0-alpha.5

stt - Coqui STT 1.4.0-alpha.4

stt - Coqui STT 1.4.0-alpha.3

stt - Coqui STT 1.4.0-alpha.2

stt - Coqui STT 1.4.0-alpha.1

stt - Coqui STT 1.4.0-alpha.0

stt - Coqui STT 1.3.0

General

Notable changes

Documentation

Contact/Getting Help

Contributors to 1.3.0 release

stt - Coqui STT 1.3.0-alpha.4

stt - Coqui STT 1.3.0-alpha.3

stt - Coqui STT 1.3.0-alpha.2

stt - Coqui STT 1.3.0-alpha.1

stt - Coqui STT 1.3.0-alpha.0

stt - Coqui STT 1.2.0

General

Notable changes

Documentation

Contact/Getting Help

Contributors to 1.2.0 release

stt - Coqui STT 1.1.0

General

Notable changes

Documentation

Contact/Getting Help

Contributors to 1.1.0 release

stt - Coqui STT 1.1.0-alpha.1

stt - Coqui STT 1.1.0-alpha.0

stt - Coqui STT 1.0.0

General

Notable changes

Training Regimen + Hyperparameters for fine-tuning

Documentation

Contact/Getting Help

Contributors to 1.0.0 release

stt - Coqui STT 0.10.0-alpha.29

stt - Coqui STT 0.10.0-alpha.28

stt - Coqui STT v0.10.0-alpha.26

stt - Coqui STT v0.10.0-alpha.25

stt - Coqui STT v0.10.0-alpha.24

stt - Coqui STT v0.10.0-alpha.23

stt - Coqui STT v0.10.0-alpha.22

stt - Coqui STT v0.10.0-alpha.21

stt - Coqui STT $tag

stt - Coqui STT v0.10.0-alpha.14

stt - STT v0.10.0-alpha.7

stt -

stt - Coqui STT 0.9.3

General

Training Regimen + Hyperparameters for fine-tuning

Documentation

Contact/Getting Help

Contributors to 0.9.3 release