my_tts_engine
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.7%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: softengrmuhammadnabeel
- License: other
- Language: Python
- Default Branch: main
- Size: 23.6 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
TTS Models Comprehensive Database
Text-to-Speech (TTS) Models
Multilingual Models
| Model Name | Language | Dataset | Model Type | Description | Default Vocoder | Model Path |
|------------|----------|---------|------------|-------------|-----------------|------------|
| xtts_v2 | multilingual | multi-dataset | ttsmodels | XTTS-v2.0.3 by Coqui with 17 languages support. This is an advanced multilingual text-to-speech model capable of generating high-quality speech in 17 different languages with cross-language voice cloning capabilities. | None | `ttsmodels/multilingual/multi-dataset/xttsv2` |
| **xttsv1.1** | multilingual | multi-dataset | ttsmodels | XTTS-v1.1 by Coqui with 14 languages, cross-language voice cloning and reference leak fixed. An improved version of the original XTTS model with enhanced voice cloning capabilities and bug fixes for reference leak issues. | None | `ttsmodels/multilingual/multi-dataset/xttsv1.1` |
| **yourtts** | multilingual | multi-dataset | ttsmodels | Your TTS model accompanying the research paper available at https://arxiv.org/abs/2112.02418. This model represents a significant advancement in multilingual TTS technology with zero-shot voice cloning capabilities. | None | `ttsmodels/multilingual/multi-dataset/yourtts` |
| bark | multilingual | multi-dataset | ttsmodels | 🐶 Bark TTS model released by suno-ai. This innovative model can generate highly realistic speech with various emotions and speaking styles. The original implementation can be found at https://github.com/suno-ai/bark. | None | tts_models/multilingual/multi-dataset/bark |
English Models
| Model Name | Language | Dataset | Model Type | Description | Default Vocoder | Model Path |
|------------|----------|---------|------------|-------------|-----------------|------------|
| tacotron2 | en | ek1 | ttsmodels | EK1 en-rp tacotron2 by NMStoker. A British English (Received Pronunciation) Tacotron2 model trained on the EK1 dataset, providing high-quality British accent speech synthesis. | `vocodermodels/en/ek1/wavegrad|ttsmodels/en/ek1/tacotron2` |
| tacotron2-DDC | en | ljspeech | ttsmodels | Tacotron2 with Double Decoder Consistency. An enhanced version of Tacotron2 that uses double decoder consistency for improved speech quality and stability during training. | vocoder_models/en/ljspeech/hifigan_v2 | tts_models/en/ljspeech/tacotron2-DDC |
| tacotron2-DDC_ph | en | ljspeech | ttsmodels | Tacotron2 with Double Decoder Consistency with phonemes. This model incorporates phoneme-level processing for more accurate pronunciation and better speech quality. | `vocodermodels/en/ljspeech/univnet|ttsmodels/en/ljspeech/tacotron2-DDCph|
| **glow-tts** | en | ljspeech | tts_models | Glow-TTS model trained on LJSpeech dataset. A flow-based generative model that provides fast and high-quality speech synthesis with improved training stability. |vocodermodels/en/ljspeech/multiband-melgan|ttsmodels/en/ljspeech/glow-tts|
| **speedy-speech** | en | ljspeech | tts_models | Speedy Speech model trained on LJSpeech dataset using the Alignment Network for learning the durations. This model focuses on fast inference while maintaining speech quality. |vocodermodels/en/ljspeech/hifiganv2|ttsmodels/en/ljspeech/speedy-speech` |
| tacotron2-DCA | en | ljspeech | ttsmodels | Tacotron2 with Decoder Consistency Algorithm. An advanced version of Tacotron2 with improved decoder consistency for better speech synthesis quality. | vocoder_models/en/ljspeech/multiband-melgan | tts_models/en/ljspeech/tacotron2-DCA |
| vits | en | ljspeech | ttsmodels | VITS is an End2End TTS model trained on LJSpeech dataset with phonemes. A cutting-edge end-to-end TTS model that combines variational inference with adversarial learning for high-quality speech synthesis. | None | `ttsmodels/en/ljspeech/vits|
| **vits--neon** | en | ljspeech | tts_models | VITS model with Neon optimizations. An optimized version of VITS for improved performance and efficiency. | None |ttsmodels/en/ljspeech/vits--neon` |
| **fastpitch** | en | ljspeech | ttsmodels | FastPitch model trained on LJSpeech using the Aligner Network. A non-autoregressive model that provides fast and parallel speech synthesis with controllable pitch and duration. | `vocodermodels/en/ljspeech/hifiganv2|ttsmodels/en/ljspeech/fastpitch` |
| overflow | en | ljspeech | ttsmodels | Overflow model trained on LJSpeech dataset. A specialized TTS model designed for handling long-form text synthesis with consistent quality. | vocoder_models/en/ljspeech/hifigan_v2 | tts_models/en/ljspeech/overflow |
| neural_hmm | en | ljspeech | ttsmodels | Neural HMM model trained on LJSpeech dataset. A hybrid model combining neural networks with Hidden Markov Models for robust speech synthesis. | `vocodermodels/en/ljspeech/hifiganv2|ttsmodels/en/ljspeech/neuralhmm` |
| vits | en | vctk | ttsmodels | VITS End2End TTS model trained on VCTK dataset with 109 different speakers with EN accent. Multi-speaker model capable of generating speech with various English accents and speaker characteristics. | None | tts_models/en/vctk/vits |
| fast_pitch | en | vctk | ttsmodels | FastPitch model trained on VCTK dataset. Multi-speaker FastPitch model supporting various English speakers and accents. | None | `ttsmodels/en/vctk/fastpitch` |
| tacotron-DDC | en | sam | ttsmodels | Tacotron2 with Double Decoder Consistency trained with Accenture's Sam dataset. Professional-grade TTS model trained on high-quality corporate speech data. | vocoder_models/en/sam/hifigan_v2 | tts_models/en/sam/tacotron-DDC |
| capacitron-t2-c50 | en | blizzard2013 | ttsmodels | Capacitron additions to Tacotron 2 with Capacity at 50 as described in https://arxiv.org/pdf/1906.03402.pdf. Enhanced model with improved capacity for handling complex speech patterns. | `vocodermodels/en/blizzard2013/hifiganv2|ttsmodels/en/blizzard2013/capacitron-t2-c50|
| **capacitron-t2-c150_v2** | en | blizzard2013 | tts_models | Capacitron additions to Tacotron 2 with Capacity at 150 as described in https://arxiv.org/pdf/1906.03402.pdf. Higher capacity version for even more complex speech synthesis tasks. |vocodermodels/en/blizzard2013/hifiganv2|ttsmodels/en/blizzard2013/capacitron-t2-c150v2|
| **tortoise-v2** | en | multi-dataset | tts_models | Tortoise TTS model version 2 from https://github.com/neonbjb/tortoise-tts. Advanced TTS model known for extremely high-quality speech synthesis with natural prosody and emotion. | None |ttsmodels/en/multi-dataset/tortoise-v2` |
| jenny | en | jenny | ttsmodels | VITS model trained with Jenny(Dioco) dataset. Named as Jenny as demanded by the license. Original model available at https://www.kaggle.com/datasets/noml4u/tts-models--en--jenny-dioco--vits. Single-speaker female voice model. | None | tts_models/en/jenny/jenny |
European Language Models
| Model Name | Language | Dataset | Model Type | Description | Default Vocoder | Model Path |
|------------|----------|---------|------------|-------------|-----------------|------------|
| vits | bg | cv | ttsmodels | Bulgarian VITS model trained on Common Voice dataset. High-quality Bulgarian text-to-speech synthesis using the VITS architecture. | None | `ttsmodels/bg/cv/vits|
| **vits** | cs | cv | tts_models | Czech VITS model trained on Common Voice dataset. Comprehensive Czech TTS model for natural speech synthesis. | None |ttsmodels/cs/cv/vits` |
| vits | da | cv | ttsmodels | Danish VITS model trained on Common Voice dataset. Advanced Danish speech synthesis model. | None | tts_models/da/cv/vits |
| vits | et | cv | ttsmodels | Estonian VITS model trained on Common Voice dataset. High-quality Estonian TTS model. | None | `ttsmodels/et/cv/vits|
| **vits** | ga | cv | tts_models | Irish Gaelic VITS model trained on Common Voice dataset. Specialized model for Irish language speech synthesis. | None |ttsmodels/ga/cv/vits` |
| tacotron2-DDC | es | mai | ttsmodels | Spanish Tacotron2 with Double Decoder Consistency trained on MAI dataset. Professional Spanish TTS model with enhanced consistency. | vocoder_models/universal/libri-tts/fullband-melgan | tts_models/es/mai/tacotron2-DDC |
| vits | es | css10 | ttsmodels | Spanish VITS model trained on CSS10 dataset. High-quality Spanish speech synthesis model. | None | `ttsmodels/es/css10/vits|
| **tacotron2-DDC** | fr | mai | tts_models | French Tacotron2 with Double Decoder Consistency trained on MAI dataset. Professional French TTS model. |vocodermodels/universal/libri-tts/fullband-melgan|ttsmodels/fr/mai/tacotron2-DDC|
| **vits** | fr | css10 | tts_models | French VITS model trained on CSS10 dataset. Advanced French speech synthesis model. | None |ttsmodels/fr/css10/vits` |
| glow-tts | uk | mai | ttsmodels | Ukrainian Glow-TTS model trained on MAI dataset. Flow-based Ukrainian TTS model. | vocoder_models/uk/mai/multiband-melgan | tts_models/uk/mai/glow-tts |
| vits | uk | mai | ttsmodels | Ukrainian VITS model trained on MAI dataset. High-quality Ukrainian speech synthesis. | None | `ttsmodels/uk/mai/vits|
| **tacotron2-DDC** | nl | mai | tts_models | Dutch Tacotron2 with Double Decoder Consistency trained on MAI dataset. Professional Dutch TTS model. |vocodermodels/nl/mai/parallel-wavegan|ttsmodels/nl/mai/tacotron2-DDC|
| **vits** | nl | css10 | tts_models | Dutch VITS model trained on CSS10 dataset. Advanced Dutch speech synthesis model. | None |ttsmodels/nl/css10/vits` |
| tacotron2-DCA | de | thorsten | ttsmodels | German Tacotron2 with Decoder Consistency Algorithm trained on Thorsten dataset. High-quality German TTS model. | vocoder_models/de/thorsten/fullband-melgan | tts_models/de/thorsten/tacotron2-DCA |
| vits | de | thorsten | ttsmodels | German VITS model trained on Thorsten dataset. Advanced German speech synthesis model. | None | `ttsmodels/de/thorsten/vits|
| **tacotron2-DDC** | de | thorsten | tts_models | Thorsten-Dec2021-22k-DDC German model with Double Decoder Consistency. Updated German TTS model with improved quality. |vocodermodels/de/thorsten/hifiganv1|ttsmodels/de/thorsten/tacotron2-DDC` |
| vits-neon | de | css10 | ttsmodels | German VITS model with Neon optimizations trained on CSS10 dataset. Optimized German TTS model. | None | tts_models/de/css10/vits-neon |
| vits | hu | css10 | ttsmodels | Hungarian VITS model trained on CSS10 dataset. High-quality Hungarian speech synthesis. | None | `ttsmodels/hu/css10/vits|
| **vits** | el | cv | tts_models | Greek VITS model trained on Common Voice dataset. Advanced Greek TTS model. | None |ttsmodels/el/cv/vits` |
| vits | fi | css10 | ttsmodels | Finnish VITS model trained on CSS10 dataset. Comprehensive Finnish speech synthesis model. | None | tts_models/fi/css10/vits |
| vits | hr | cv | ttsmodels | Croatian VITS model trained on Common Voice dataset. High-quality Croatian TTS model. | None | `ttsmodels/hr/cv/vits|
| **vits** | lt | cv | tts_models | Lithuanian VITS model trained on Common Voice dataset. Advanced Lithuanian speech synthesis. | None |ttsmodels/lt/cv/vits` |
| vits | lv | cv | ttsmodels | Latvian VITS model trained on Common Voice dataset. Professional Latvian TTS model. | None | tts_models/lv/cv/vits |
| vits | mt | cv | ttsmodels | Maltese VITS model trained on Common Voice dataset. Specialized Maltese speech synthesis model. | None | `ttsmodels/mt/cv/vits|
| **vits** | pl | mai_female | tts_models | Polish VITS model with female voice trained on MAI dataset. High-quality Polish female TTS model. | None |ttsmodels/pl/maifemale/vits|
| **vits** | pt | cv | tts_models | Portuguese VITS model trained on Common Voice dataset. Comprehensive Portuguese TTS model. | None |ttsmodels/pt/cv/vits` |
| vits | ro | cv | ttsmodels | Romanian VITS model trained on Common Voice dataset. Advanced Romanian speech synthesis. | None | tts_models/ro/cv/vits |
| vits | sk | cv | ttsmodels | Slovak VITS model trained on Common Voice dataset. Professional Slovak TTS model. | None | `ttsmodels/sk/cv/vits|
| **vits** | sl | cv | tts_models | Slovenian VITS model trained on Common Voice dataset. High-quality Slovenian speech synthesis. | None |ttsmodels/sl/cv/vits` |
| vits | sv | cv | ttsmodels | Swedish VITS model trained on Common Voice dataset. Advanced Swedish TTS model. | None | tts_models/sv/cv/vits |
| glow-tts | it | maifemale | ttsmodels | Italian Glow-TTS model with female voice as explained in https://github.com/coqui-ai/TTS/issues/1148. Female Italian TTS model with flow-based architecture. | None | tts_models/it/mai_female/glow-tts |
| vits | it | maifemale | ttsmodels | Italian VITS model with female voice as explained in https://github.com/coqui-ai/TTS/issues/1148. High-quality female Italian speech synthesis. | None | tts_models/it/mai_female/vits |
| glow-tts | it | maimale | ttsmodels | Italian Glow-TTS model with male voice as explained in https://github.com/coqui-ai/TTS/issues/1148. Male Italian TTS model with flow-based architecture. | None | tts_models/it/mai_male/glow-tts |
| vits | it | maimale | ttsmodels | Italian VITS model with male voice as explained in https://github.com/coqui-ai/TTS/issues/1148. High-quality male Italian speech synthesis. | None | tts_models/it/mai_male/vits |
| glow-tts | tr | common-voice | ttsmodels | Turkish GlowTTS model using an unknown speaker from the Common-Voice dataset. High-quality Turkish speech synthesis with flow-based architecture. | `vocodermodels/tr/common-voice/hifigan|ttsmodels/tr/common-voice/glow-tts` |
| glow-tts | be | common-voice | ttsmodels | Belarusian GlowTTS model created by @alex73 (Github). Community-contributed Belarusian TTS model with flow-based architecture. | vocoder_models/be/common-voice/hifigan | tts_models/be/common-voice/glow-tts |
Asian Language Models
| Model Name | Language | Dataset | Model Type | Description | Default Vocoder | Model Path |
|------------|----------|---------|------------|-------------|-----------------|------------|
| tacotron2-DDC-GST | zh-CN | baker | ttsmodels | Chinese Tacotron2 with Double Decoder Consistency and Global Style Tokens trained on Baker dataset. Advanced Chinese TTS model with style control capabilities. | None | `ttsmodels/zh-CN/baker/tacotron2-DDC-GST|
| **tacotron2-DDC** | ja | kokoro | tts_models | Tacotron2 with Double Decoder Consistency trained with Kokoro Speech Dataset. High-quality Japanese TTS model with emotional speech capabilities. |vocodermodels/ja/kokoro/hifiganv1|tts_models/ja/kokoro/tacotron2-DDC` |
African Language Models
| Model Name | Language | Dataset | Model Type | Description | Default Vocoder | Model Path |
|------------|----------|---------|------------|-------------|-----------------|------------|
| vits | ewe | openbible | ttsmodels | Ewe VITS model trained on OpenBible dataset. Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible. Religious text-based TTS model for Ewe language. | None | `ttsmodels/ewe/openbible/vits|
| **vits** | hau | openbible | tts_models | Hausa VITS model trained on OpenBible dataset. Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible. Religious text-based TTS model for Hausa language. | None |ttsmodels/hau/openbible/vits` |
| vits | lin | openbible | ttsmodels | Lingala VITS model trained on OpenBible dataset. Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible. Religious text-based TTS model for Lingala language. | None | tts_models/lin/openbible/vits |
| vits | twakuapem | openbible | ttsmodels | Twi (Akuapem) VITS model trained on OpenBible dataset. Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible. Religious text-based TTS model for Twi Akuapem dialect. | None | tts_models/tw_akuapem/openbible/vits |
| vits | twasante | openbible | ttsmodels | Twi (Asante) VITS model trained on OpenBible dataset. Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible. Religious text-based TTS model for Twi Asante dialect. | None | tts_models/tw_asante/openbible/vits |
| vits | yor | openbible | ttsmodels | Yoruba VITS model trained on OpenBible dataset. Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible. Religious text-based TTS model for Yoruba language. | None | `ttsmodels/yor/openbible/vits` |
Custom Language Models
| Model Name | Language | Dataset | Model Type | Description | Default Vocoder | Model Path |
|------------|----------|---------|------------|-------------|-----------------|------------|
| vits | ca | custom | ttsmodels | Catalan VITS model trained from zero with 101,460 utterances consisting of 257 speakers, approximately 138 hours of speech. Uses three datasets: Festcat, Google Catalan TTS, and Common Voice 8. Trained with TTS v0.8.0. More details at https://github.com/coqui-ai/TTS/discussions/930#discussioncomment-4466345 | None | `ttsmodels/ca/custom/vits|
| **glow-tts** | fa | custom | tts_models | Persian TTS female Glow-TTS model for text-to-speech purposes. Single-speaker female voice trained on persian-tts-dataset-female. Note: This model has no compatible vocoder, thus output quality may not be optimal. Dataset available at https://www.kaggle.com/datasets/magnoliasis/persian-tts-dataset-famale | None |ttsmodels/fa/custom/glow-tts` |
| vits-male | bn | custom | ttsmodels | Single speaker Bangla male VITS model. Comprehensive Bangla TTS model for male voice synthesis. For more information visit https://github.com/mobassir94/comprehensive-bangla-tts | None | tts_models/bn/custom/vits-male |
| vits-female | bn | custom | ttsmodels | Single speaker Bangla female VITS model. Comprehensive Bangla TTS model for female voice synthesis. For more information visit https://github.com/mobassir94/comprehensive-bangla-tts | None | `ttsmodels/bn/custom/vits-female` |
Voice Conversion Models
| Model Name | Language | Dataset | Model Type | Description | Default Vocoder | Model Path |
|------------|----------|---------|------------|-------------|-----------------|------------|
| freevc24 | multilingual | vctk | voiceconversionmodels | FreeVC model trained on VCTK dataset from https://github.com/OlaWod/FreeVC. Advanced voice conversion model capable of converting voice characteristics while preserving linguistic content across multiple languages. | None | voice_conversion_models/multilingual/vctk/freevc24 |
Vocoder Models
Universal Vocoders
| Model Name | Language | Dataset | Model Type | Description | Model Path |
|------------|----------|---------|------------|-------------|------------|
| wavegrad | universal | libri-tts | vocodermodels | Universal WaveGrad vocoder trained on LibriTTS dataset. High-quality neural vocoder for converting mel-spectrograms to audio waveforms. | `vocodermodels/universal/libri-tts/wavegrad|
| **fullband-melgan** | universal | libri-tts | vocoder_models | Universal Fullband MelGAN vocoder trained on LibriTTS dataset. Advanced generative vocoder for high-fidelity audio synthesis with full frequency band coverage. |vocoder_models/universal/libri-tts/fullband-melgan` |
English Vocoders
| Model Name | Language | Dataset | Model Type | Description | Model Path |
|------------|----------|---------|------------|-------------|------------|
| wavegrad | en | ek1 | vocodermodels | EK1 English (Received Pronunciation) WaveGrad vocoder by NMStoker. Specialized vocoder for British English accent synthesis. | `vocodermodels/en/ek1/wavegrad|
| **multiband-melgan** | en | ljspeech | vocoder_models | Multi-band MelGAN vocoder trained on LJSpeech dataset. Efficient vocoder that processes multiple frequency bands simultaneously for faster inference. |vocodermodels/en/ljspeech/multiband-melgan` |
| **hifiganv2** | en | ljspeech | vocodermodels | HiFiGAN v2 LJSpeech vocoder from https://arxiv.org/abs/2010.05646. State-of-the-art generative adversarial network-based vocoder for high-quality audio generation. | `vocodermodels/en/ljspeech/hifiganv2` |
| univnet | en | ljspeech | vocodermodels | UnivNet model fine-tuned on TacotronDDCph spectrograms for better compatibility. Universal neural vocoder optimized for phoneme-based TTS models. | `vocodermodels/en/ljspeech/univnet|
| **hifigan_v2** | en | blizzard2013 | vocoder_models | HiFiGAN v2 vocoder adapted for Blizzard2013 dataset from https://arxiv.org/abs/2010.05646. Professional-grade vocoder for high-quality speech synthesis. |vocodermodels/en/blizzard2013/hifiganv2|
| **hifigan_v2** | en | vctk | vocoder_models | HiFiGAN v2 fine-tuned for VCTK dataset, intended for use with tts_models/en/vctk/sc-glow-tts. Multi-speaker vocoder supporting various English accents. |vocodermodels/en/vctk/hifiganv2|
| **hifigan_v2** | en | sam | vocoder_models | HiFiGAN v2 fine-tuned for SAM dataset, intended for use with tts_models/en/sam/tacotron_DDC. Corporate-grade vocoder for professional speech synthesis. |vocodermodels/en/sam/hifiganv2` |
European Language Vocoders
| Model Name | Language | Dataset | Model Type | Description | Model Path |
|------------|----------|---------|------------|-------------|------------|
| parallel-wavegan | nl | mai | vocodermodels | Parallel WaveGAN vocoder for Dutch language trained on MAI dataset. High-quality Dutch speech synthesis vocoder with parallel processing capabilities. | `vocodermodels/nl/mai/parallel-wavegan|
| **wavegrad** | de | thorsten | vocoder_models | WaveGrad vocoder for German language trained on Thorsten dataset. Diffusion-based vocoder for high-quality German speech synthesis. |vocodermodels/de/thorsten/wavegrad` |
| fullband-melgan | de | thorsten | vocodermodels | Fullband MelGAN vocoder for German language trained on Thorsten dataset. Advanced German vocoder with full frequency band coverage. | vocoder_models/de/thorsten/fullband-melgan |
| hifigan_v1 | de | thorsten | vocodermodels | HiFiGAN v1 vocoder for Thorsten Neutral Dec2021 22k sample rate Tacotron2 DDC model. Specialized German vocoder optimized for the Thorsten dataset. | `vocodermodels/de/thorsten/hifiganv1` |
| multiband-melgan | uk | mai | vocodermodels | Multi-band MelGAN vocoder for Ukrainian language trained on MAI dataset. Ukrainian speech synthesis vocoder with multi-band processing. | vocoder_models/uk/mai/multiband-melgan |
| hifigan | tr | common-voice | vocodermodels | HiFiGAN vocoder for Turkish language using an unknown speaker from the Common-Voice dataset. High-quality Turkish speech synthesis vocoder. | `vocodermodels/tr/common-voice/hifigan|
| **hifigan** | be | common-voice | vocoder_models | Belarusian HiFiGAN vocoder created by @alex73 (Github). Community-contributed Belarusian speech synthesis vocoder. |vocoder_models/be/common-voice/hifigan` |
Asian Language Vocoders
| Model Name | Language | Dataset | Model Type | Description | Model Path | |------------|----------|---------|------------|-------------|------------| | hifigan_v1 | ja | kokoro | vocodermodels | HiFiGAN v1 vocoder for Japanese language trained on Kokoro dataset by @kaiidams. High-quality Japanese speech synthesis vocoder with emotional speech capabilities. | `vocodermodels/ja/kokoro/hifigan_v1` |
Summary Statistics
- Total TTS Models: 72 models
- Total Voice Conversion Models: 1 model
- Total Vocoder Models: 18 models
- Languages Supported: 40+ languages including multilingual models
- Architecture Types: VITS, Tacotron2, Glow-TTS, FastPitch, Bark, XTTS, and more
- Key Features: Cross-language voice cloning, multi-speaker support, emotional speech synthesis, and professional-grade quality
Usage Notes
- Model Paths: Use the exact model paths provided in the tables when loading models
- Vocoder Compatibility: Some models require specific vocoders for optimal performance
- Language Support: Multilingual models support multiple languages in a single model
- Quality Levels: Models vary from research-grade to production-ready quality
- Licensing: Some models have specific licensing requirements (e.g., Jenny model)
- Community Contributions: Several models are contributed by the community (indicated by contributor names)
🐸Coqui.ai News
- 📣 ⓍTTSv2 is here with 16 languages and better performance across the board.
- 📣 ⓍTTS fine-tuning code is out. Check the example recipes.
- 📣 ⓍTTS can now stream with <200ms latency.
- 📣 ⓍTTS, our production TTS model that can speak 13 languages, is released Blog Post, Demo, Docs
- 📣 🐶Bark is now available for inference with unconstrained voice cloning. Docs
- 📣 You can use ~1100 Fairseq models with 🐸TTS.
- 📣 🐸TTS now supports 🐢Tortoise with faster inference. Docs
##
**🐸TTS is a library for advanced Text-to-Speech generation.**
🚀 Pretrained models in +1100 languages.
🛠️ Tools for training new models and fine-tuning existing models in any language.
📚 Utilities for dataset analysis and curation.
______________________________________________________________________
[](https://discord.gg/5eXr5seRrv)
[
Underlined "TTS" and "Judy" are internal 🐸TTS models that are not released open-source. They are here to show the potential. Models prefixed with a dot (.Jofish .Abe and .Janice) are real human voices.
Features
- High-performance Deep Learning models for Text2Speech tasks.
- Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech).
- Speaker Encoder to compute speaker embeddings efficiently.
- Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN)
- Fast and efficient model training.
- Detailed training logs on the terminal and Tensorboard.
- Support for Multi-speaker TTS.
- Efficient, flexible, lightweight but feature complete
Trainer API. - Released and ready-to-use models.
- Tools to curate Text2Speech datasets under
dataset_analysis. - Utilities to use and test your models.
- Modular (but not too much) code base enabling easy implementation of new ideas.
Model Implementations
Spectrogram models
- Tacotron: paper
- Tacotron2: paper
- Glow-TTS: paper
- Speedy-Speech: paper
- Align-TTS: paper
- FastPitch: paper
- FastSpeech: paper
- FastSpeech2: paper
- SC-GlowTTS: paper
- Capacitron: paper
- OverFlow: paper
- Neural HMM TTS: paper
- Delightful TTS: paper
End-to-End Models
- ⓍTTS: blog
- VITS: paper
- 🐸 YourTTS: paper
- 🐢 Tortoise: orig. repo
- 🐶 Bark: orig. repo
Attention Methods
- Guided Attention: paper
- Forward Backward Decoding: paper
- Graves Attention: paper
- Double Decoder Consistency: blog
- Dynamic Convolutional Attention: paper
- Alignment Network: paper
Speaker Encoder
Vocoders
- MelGAN: paper
- MultiBandMelGAN: paper
- ParallelWaveGAN: paper
- GAN-TTS discriminators: paper
- WaveRNN: origin
- WaveGrad: paper
- HiFiGAN: paper
- UnivNet: paper
Voice Conversion
- FreeVC: paper
You can also help us implement more models.
Installation
🐸TTS is tested on Ubuntu 18.04 with python >= 3.9, < 3.12..
If you are only interested in synthesizing speech with the released 🐸TTS models, installing from PyPI is the easiest option.
bash
pip install TTS
If you plan to code or train models, clone 🐸TTS and install it locally.
bash
git clone https://github.com/coqui-ai/TTS
pip install -e .[all,dev,notebooks] # Select the relevant extras
If you are on Ubuntu (Debian), you can also run following commands for installation.
bash
$ make system-deps # intended to be used on Ubuntu (Debian). Let us know if you have a different OS.
$ make install
If you are on Windows, 👑@GuyPaddock wrote installation instructions here.
Docker Image
You can also try TTS without install with the docker image. Simply run the following command and you will be able to run TTS without installing it.
bash
docker run --rm -it -p 5002:5002 --entrypoint /bin/bash ghcr.io/coqui-ai/tts-cpu
python3 TTS/server/server.py --list_models #To get the list of available models
python3 TTS/server/server.py --model_name tts_models/en/vctk/vits # To start a server
You can then enjoy the TTS server here More details about the docker images (like GPU support) can be found here
Synthesizing speech by 🐸TTS
🐍 Python API
Running a multi-speaker and multi-lingual model
```python import torch from TTS.api import TTS
Get device
device = "cuda" if torch.cuda.is_available() else "cpu"
List available 🐸TTS models
print(TTS().list_models())
Init TTS
tts = TTS("ttsmodels/multilingual/multi-dataset/xttsv2").to(device)
Run TTS
❗ Since this model is multi-lingual voice cloning model, we must set the target speaker_wav and language
Text to speech list of amplitude values as output
wav = tts.tts(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en")
Text to speech to a file
tts.ttstofile(text="Hello world!", speakerwav="my/cloning/audio.wav", language="en", filepath="output.wav") ```
Running a single speaker model
```python
Init TTS with the target model name
tts = TTS(modelname="ttsmodels/de/thorsten/tacotron2-DDC", progress_bar=False).to(device)
Run TTS
tts.ttstofile(text="Ich bin eine Testnachricht.", filepath=OUTPUTPATH)
Example voice cloning with YourTTS in English, French and Portuguese
tts = TTS(modelname="ttsmodels/multilingual/multi-dataset/yourtts", progressbar=False).to(device) tts.ttstofile("This is voice cloning.", speakerwav="my/cloning/audio.wav", language="en", filepath="output.wav") tts.ttstofile("C'est le clonage de la voix.", speakerwav="my/cloning/audio.wav", language="fr-fr", filepath="output.wav") tts.ttstofile("Isso é clonagem de voz.", speakerwav="my/cloning/audio.wav", language="pt-br", filepath="output.wav") ```
Example voice conversion
Converting the voice in source_wav to the voice of target_wav
python
tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False).to("cuda")
tts.voice_conversion_to_file(source_wav="my/source.wav", target_wav="my/target.wav", file_path="output.wav")
Example voice cloning together with the voice conversion model.
This way, you can clone voices by using any model in 🐸TTS.
```python
tts = TTS("ttsmodels/de/thorsten/tacotron2-DDC") tts.ttswithvctofile( "Wie sage ich auf Italienisch, dass ich dich liebe?", speakerwav="target/speaker.wav", file_path="output.wav" ) ```
Example text to speech using Fairseq models in ~1100 languages 🤯.
For Fairseq models, use the following name format: tts_models/<lang-iso_code>/fairseq/vits.
You can find the language ISO codes here
and learn about the Fairseq models here.
```python
TTS with on the fly voice conversion
api = TTS("ttsmodels/deu/fairseq/vits") api.ttswithvctofile( "Wie sage ich auf Italienisch, dass ich dich liebe?", speakerwav="target/speaker.wav", file_path="output.wav" ) ```
Command-line tts
Synthesize speech on command line.
You can either use your trained model or choose a model from the provided list.
If you don't specify any models, then it uses LJSpeech based English model.
Single Speaker Models
- List provided models:
$ tts --list_models
Get model info (for both ttsmodels and vocodermodels):
- Query by type/name:
The modelinfobyname uses the name as it from the --listmodels.
$ tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"For example:$ tts --model_info_by_name tts_models/tr/common-voice/glow-tts $ tts --model_info_by_name vocoder_models/en/ljspeech/hifigan_v2 - Query by type/idx: The modelqueryidx uses the corresponding idx from --list_models.
$ tts --model_info_by_idx "<model_type>/<model_query_idx>"For example:
$ tts --model_info_by_idx tts_models/3- Query info for model info by full name:$ tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"- Query by type/name:
The modelinfobyname uses the name as it from the --listmodels.
Run TTS with default models:
$ tts --text "Text for TTS" --out_path output/path/speech.wav
- Run TTS and pipe out the generated TTS wav file data:
$ tts --text "Text for TTS" --pipe_out --out_path output/path/speech.wav | aplay
- Run a TTS model with its default vocoder model:
$ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav
For example:
$ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --out_path output/path/speech.wav
- Run with specific TTS and vocoder models from the list:
$ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --vocoder_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav
For example:
$ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --vocoder_name "vocoder_models/en/ljspeech/univnet" --out_path output/path/speech.wav
- Run your own TTS model (Using Griffin-Lim Vocoder):
$ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav
- Run your own TTS and Vocoder models:
$ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav
--vocoder_path path/to/vocoder.pth --vocoder_config_path path/to/vocoder_config.json
Multi-speaker Models
- List the available speakers and choose a
among them:
$ tts --model_name "<language>/<dataset>/<model_name>" --list_speaker_idxs
- Run the multi-speaker TTS model with the target speaker ID:
$ tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" --speaker_idx <speaker_id>
- Run your own multi-speaker TTS model:
$ tts --text "Text for TTS" --out_path output/path/speech.wav --model_path path/to/model.pth --config_path path/to/config.json --speakers_file_path path/to/speaker.json --speaker_idx <speaker_id>
Voice Conversion Models
$ tts --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" --source_wav <path/to/speaker/wav> --target_wav <path/to/reference/wav>
Directory Structure
|- notebooks/ (Jupyter Notebooks for model evaluation, parameter selection and data analysis.)
|- utils/ (common utilities.)
|- TTS
|- bin/ (folder for all the executables.)
|- train*.py (train your target model.)
|- ...
|- tts/ (text to speech models)
|- layers/ (model layer definitions)
|- models/ (model definitions)
|- utils/ (model specific utilities.)
|- speaker_encoder/ (Speaker Encoder models.)
|- (same)
|- vocoder/ (Vocoder models.)
|- (same)
Owner
- Name: Muhammad Nabeel Khan
- Login: softengrmuhammadnabeel
- Kind: user
- Repositories: 1
- Profile: https://github.com/softengrmuhammadnabeel
I'm a front-end dev that's great at CSS and someday-going-to-be-great at JavaScript .I am ready to indicate of some issues in the front-end industry.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you want to cite 🐸💬, feel free to use this (but only if you loved it 😊)"
title: "Coqui TTS"
abstract: "A deep learning toolkit for Text-to-Speech, battle-tested in research and production"
date-released: 2021-01-01
authors:
- family-names: "Eren"
given-names: "Gölge"
- name: "The Coqui TTS Team"
version: 1.4
doi: 10.5281/zenodo.6334862
license: "MPL-2.0"
url: "https://www.coqui.ai"
repository-code: "https://github.com/coqui-ai/TTS"
keywords:
- machine learning
- deep learning
- artificial intelligence
- text to speech
- TTS
GitHub Events
Total
- Public event: 1
- Push event: 3
Last Year
- Public event: 1
- Push event: 3
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v2 composite
- docker/build-push-action v2 composite
- docker/login-action v1 composite
- docker/setup-buildx-action v1 composite
- docker/setup-qemu-action v1 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/download-artifact v2 composite
- actions/setup-python v2 composite
- actions/upload-artifact v2 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- ${BASE} latest build
- ubuntu 22.04 build
- furo *
- linkify-it-py *
- myst-parser ==2.0.0
- sphinx ==7.2.5
- sphinx_copybutton *
- sphinx_inline_tabs *
- black * development
- coverage * development
- isort * development
- nose2 * development
- pylint ==2.10.2 development
- cutlet *
- mecab-python3 ==1.0.6
- unidic-lite ==1.0.8
- bokeh ==1.4.0
- aiohttp >=3.8.1
- anyascii >=0.3.0
- bangla *
- bnnumerizer *
- bnunicodenormalizer *
- coqpit >=0.0.16
- cython >=0.29.30
- einops >=0.6.0
- encodec >=0.1.1
- flask >=2.0.1
- fsspec >=2023.6.0
- g2pkk >=0.1.1
- gruut ==2.2.3
- hangul_romanize *
- inflect >=5.6.0
- jamo *
- jieba *
- librosa >=0.10.0
- matplotlib >=3.7.0
- mutagen ==1.47.0
- nltk *
- num2words *
- numba >=0.57.0
- numba ==0.55.1
- numpy >=1.24.3
- numpy ==1.22.0
- packaging >=23.1
- pandas >=1.4,<2.0
- pypinyin *
- pysbd >=0.3.4
- pyyaml >=6.0
- scikit-learn >=1.3.0
- scipy >=1.11.2
- soundfile >=0.12.0
- spacy >=3
- torch >=2.1
- torchaudio *
- tqdm >=4.64.1
- trainer >=0.0.36
- transformers >=4.33.0
- umap-learn >=0.5.1
- unidecode >=1.3.2