my_tts_engine

https://github.com/softengrmuhammadnabeel/my_tts_engine

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (6.7%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: softengrmuhammadnabeel
License: other
Language: Python
Default Branch: main
Size: 23.6 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme Contributing License Code of conduct Citation

TTS Models Comprehensive Database

Text-to-Speech (TTS) Models

Multilingual Models

| Model Name | Language | Dataset | Model Type | Description | Default Vocoder | Model Path | |------------|----------|---------|------------|-------------|-----------------|------------| | xtts_v2 | multilingual | multi-dataset | ttsmodels | XTTS-v2.0.3 by Coqui with 17 languages support. This is an advanced multilingual text-to-speech model capable of generating high-quality speech in 17 different languages with cross-language voice cloning capabilities. | None | `ttsmodels/multilingual/multi-dataset/xttsv2` | | **xttsv1.1** | multilingual | multi-dataset | ttsmodels | XTTS-v1.1 by Coqui with 14 languages, cross-language voice cloning and reference leak fixed. An improved version of the original XTTS model with enhanced voice cloning capabilities and bug fixes for reference leak issues. | None | `ttsmodels/multilingual/multi-dataset/xttsv1.1` | | **yourtts** | multilingual | multi-dataset | ttsmodels | Your TTS model accompanying the research paper available at https://arxiv.org/abs/2112.02418. This model represents a significant advancement in multilingual TTS technology with zero-shot voice cloning capabilities. | None | `ttsmodels/multilingual/multi-dataset/yourtts` | | bark | multilingual | multi-dataset | ttsmodels | 🐶 Bark TTS model released by suno-ai. This innovative model can generate highly realistic speech with various emotions and speaking styles. The original implementation can be found at https://github.com/suno-ai/bark. | None | tts_models/multilingual/multi-dataset/bark |

English Models

| Model Name | Language | Dataset | Model Type | Description | Default Vocoder | Model Path | |------------|----------|---------|------------|-------------|-----------------|------------| | tacotron2 | en | ek1 | ttsmodels | EK1 en-rp tacotron2 by NMStoker. A British English (Received Pronunciation) Tacotron2 model trained on the EK1 dataset, providing high-quality British accent speech synthesis. | `vocodermodels/en/ek1/wavegrad|ttsmodels/en/ek1/tacotron2` | | tacotron2-DDC | en | ljspeech | ttsmodels | Tacotron2 with Double Decoder Consistency. An enhanced version of Tacotron2 that uses double decoder consistency for improved speech quality and stability during training. | vocoder_models/en/ljspeech/hifigan_v2 | tts_models/en/ljspeech/tacotron2-DDC | | tacotron2-DDC_ph | en | ljspeech | ttsmodels | Tacotron2 with Double Decoder Consistency with phonemes. This model incorporates phoneme-level processing for more accurate pronunciation and better speech quality. | `vocodermodels/en/ljspeech/univnet|ttsmodels/en/ljspeech/tacotron2-DDCph| | **glow-tts** | en | ljspeech | tts_models | Glow-TTS model trained on LJSpeech dataset. A flow-based generative model that provides fast and high-quality speech synthesis with improved training stability. |vocodermodels/en/ljspeech/multiband-melgan|ttsmodels/en/ljspeech/glow-tts| | **speedy-speech** | en | ljspeech | tts_models | Speedy Speech model trained on LJSpeech dataset using the Alignment Network for learning the durations. This model focuses on fast inference while maintaining speech quality. |vocodermodels/en/ljspeech/hifiganv2|ttsmodels/en/ljspeech/speedy-speech` | | tacotron2-DCA | en | ljspeech | ttsmodels | Tacotron2 with Decoder Consistency Algorithm. An advanced version of Tacotron2 with improved decoder consistency for better speech synthesis quality. | vocoder_models/en/ljspeech/multiband-melgan | tts_models/en/ljspeech/tacotron2-DCA | | vits | en | ljspeech | ttsmodels | VITS is an End2End TTS model trained on LJSpeech dataset with phonemes. A cutting-edge end-to-end TTS model that combines variational inference with adversarial learning for high-quality speech synthesis. | None | `ttsmodels/en/ljspeech/vits| | **vits--neon** | en | ljspeech | tts_models | VITS model with Neon optimizations. An optimized version of VITS for improved performance and efficiency. | None |ttsmodels/en/ljspeech/vits--neon` | | **fastpitch** | en | ljspeech | ttsmodels | FastPitch model trained on LJSpeech using the Aligner Network. A non-autoregressive model that provides fast and parallel speech synthesis with controllable pitch and duration. | `vocodermodels/en/ljspeech/hifiganv2|ttsmodels/en/ljspeech/fastpitch` | | overflow | en | ljspeech | ttsmodels | Overflow model trained on LJSpeech dataset. A specialized TTS model designed for handling long-form text synthesis with consistent quality. | vocoder_models/en/ljspeech/hifigan_v2 | tts_models/en/ljspeech/overflow | | neural_hmm | en | ljspeech | ttsmodels | Neural HMM model trained on LJSpeech dataset. A hybrid model combining neural networks with Hidden Markov Models for robust speech synthesis. | `vocodermodels/en/ljspeech/hifiganv2|ttsmodels/en/ljspeech/neuralhmm` | | vits | en | vctk | ttsmodels | VITS End2End TTS model trained on VCTK dataset with 109 different speakers with EN accent. Multi-speaker model capable of generating speech with various English accents and speaker characteristics. | None | tts_models/en/vctk/vits | | fast_pitch | en | vctk | ttsmodels | FastPitch model trained on VCTK dataset. Multi-speaker FastPitch model supporting various English speakers and accents. | None | `ttsmodels/en/vctk/fastpitch` | | tacotron-DDC | en | sam | ttsmodels | Tacotron2 with Double Decoder Consistency trained with Accenture's Sam dataset. Professional-grade TTS model trained on high-quality corporate speech data. | vocoder_models/en/sam/hifigan_v2 | tts_models/en/sam/tacotron-DDC | | capacitron-t2-c50 | en | blizzard2013 | ttsmodels | Capacitron additions to Tacotron 2 with Capacity at 50 as described in https://arxiv.org/pdf/1906.03402.pdf. Enhanced model with improved capacity for handling complex speech patterns. | `vocodermodels/en/blizzard2013/hifiganv2|ttsmodels/en/blizzard2013/capacitron-t2-c50| | **capacitron-t2-c150_v2** | en | blizzard2013 | tts_models | Capacitron additions to Tacotron 2 with Capacity at 150 as described in https://arxiv.org/pdf/1906.03402.pdf. Higher capacity version for even more complex speech synthesis tasks. |vocodermodels/en/blizzard2013/hifiganv2|ttsmodels/en/blizzard2013/capacitron-t2-c150v2| | **tortoise-v2** | en | multi-dataset | tts_models | Tortoise TTS model version 2 from https://github.com/neonbjb/tortoise-tts. Advanced TTS model known for extremely high-quality speech synthesis with natural prosody and emotion. | None |ttsmodels/en/multi-dataset/tortoise-v2` | | jenny | en | jenny | ttsmodels | VITS model trained with Jenny(Dioco) dataset. Named as Jenny as demanded by the license. Original model available at https://www.kaggle.com/datasets/noml4u/tts-models--en--jenny-dioco--vits. Single-speaker female voice model. | None | tts_models/en/jenny/jenny |

European Language Models

| Model Name | Language | Dataset | Model Type | Description | Default Vocoder | Model Path | |------------|----------|---------|------------|-------------|-----------------|------------| | vits | bg | cv | ttsmodels | Bulgarian VITS model trained on Common Voice dataset. High-quality Bulgarian text-to-speech synthesis using the VITS architecture. | None | `ttsmodels/bg/cv/vits| | **vits** | cs | cv | tts_models | Czech VITS model trained on Common Voice dataset. Comprehensive Czech TTS model for natural speech synthesis. | None |ttsmodels/cs/cv/vits` | | vits | da | cv | ttsmodels | Danish VITS model trained on Common Voice dataset. Advanced Danish speech synthesis model. | None | tts_models/da/cv/vits | | vits | et | cv | ttsmodels | Estonian VITS model trained on Common Voice dataset. High-quality Estonian TTS model. | None | `ttsmodels/et/cv/vits| | **vits** | ga | cv | tts_models | Irish Gaelic VITS model trained on Common Voice dataset. Specialized model for Irish language speech synthesis. | None |ttsmodels/ga/cv/vits` | | tacotron2-DDC | es | mai | ttsmodels | Spanish Tacotron2 with Double Decoder Consistency trained on MAI dataset. Professional Spanish TTS model with enhanced consistency. | vocoder_models/universal/libri-tts/fullband-melgan | tts_models/es/mai/tacotron2-DDC | | vits | es | css10 | ttsmodels | Spanish VITS model trained on CSS10 dataset. High-quality Spanish speech synthesis model. | None | `ttsmodels/es/css10/vits| | **tacotron2-DDC** | fr | mai | tts_models | French Tacotron2 with Double Decoder Consistency trained on MAI dataset. Professional French TTS model. |vocodermodels/universal/libri-tts/fullband-melgan|ttsmodels/fr/mai/tacotron2-DDC| | **vits** | fr | css10 | tts_models | French VITS model trained on CSS10 dataset. Advanced French speech synthesis model. | None |ttsmodels/fr/css10/vits` | | glow-tts | uk | mai | ttsmodels | Ukrainian Glow-TTS model trained on MAI dataset. Flow-based Ukrainian TTS model. | vocoder_models/uk/mai/multiband-melgan | tts_models/uk/mai/glow-tts | | vits | uk | mai | ttsmodels | Ukrainian VITS model trained on MAI dataset. High-quality Ukrainian speech synthesis. | None | `ttsmodels/uk/mai/vits| | **tacotron2-DDC** | nl | mai | tts_models | Dutch Tacotron2 with Double Decoder Consistency trained on MAI dataset. Professional Dutch TTS model. |vocodermodels/nl/mai/parallel-wavegan|ttsmodels/nl/mai/tacotron2-DDC| | **vits** | nl | css10 | tts_models | Dutch VITS model trained on CSS10 dataset. Advanced Dutch speech synthesis model. | None |ttsmodels/nl/css10/vits` | | tacotron2-DCA | de | thorsten | ttsmodels | German Tacotron2 with Decoder Consistency Algorithm trained on Thorsten dataset. High-quality German TTS model. | vocoder_models/de/thorsten/fullband-melgan | tts_models/de/thorsten/tacotron2-DCA | | vits | de | thorsten | ttsmodels | German VITS model trained on Thorsten dataset. Advanced German speech synthesis model. | None | `ttsmodels/de/thorsten/vits| | **tacotron2-DDC** | de | thorsten | tts_models | Thorsten-Dec2021-22k-DDC German model with Double Decoder Consistency. Updated German TTS model with improved quality. |vocodermodels/de/thorsten/hifiganv1|ttsmodels/de/thorsten/tacotron2-DDC` | | vits-neon | de | css10 | ttsmodels | German VITS model with Neon optimizations trained on CSS10 dataset. Optimized German TTS model. | None | tts_models/de/css10/vits-neon | | vits | hu | css10 | ttsmodels | Hungarian VITS model trained on CSS10 dataset. High-quality Hungarian speech synthesis. | None | `ttsmodels/hu/css10/vits| | **vits** | el | cv | tts_models | Greek VITS model trained on Common Voice dataset. Advanced Greek TTS model. | None |ttsmodels/el/cv/vits` | | vits | fi | css10 | ttsmodels | Finnish VITS model trained on CSS10 dataset. Comprehensive Finnish speech synthesis model. | None | tts_models/fi/css10/vits | | vits | hr | cv | ttsmodels | Croatian VITS model trained on Common Voice dataset. High-quality Croatian TTS model. | None | `ttsmodels/hr/cv/vits| | **vits** | lt | cv | tts_models | Lithuanian VITS model trained on Common Voice dataset. Advanced Lithuanian speech synthesis. | None |ttsmodels/lt/cv/vits` | | vits | lv | cv | ttsmodels | Latvian VITS model trained on Common Voice dataset. Professional Latvian TTS model. | None | tts_models/lv/cv/vits | | vits | mt | cv | ttsmodels | Maltese VITS model trained on Common Voice dataset. Specialized Maltese speech synthesis model. | None | `ttsmodels/mt/cv/vits| | **vits** | pl | mai_female | tts_models | Polish VITS model with female voice trained on MAI dataset. High-quality Polish female TTS model. | None |ttsmodels/pl/maifemale/vits| | **vits** | pt | cv | tts_models | Portuguese VITS model trained on Common Voice dataset. Comprehensive Portuguese TTS model. | None |ttsmodels/pt/cv/vits` | | vits | ro | cv | ttsmodels | Romanian VITS model trained on Common Voice dataset. Advanced Romanian speech synthesis. | None | tts_models/ro/cv/vits | | vits | sk | cv | ttsmodels | Slovak VITS model trained on Common Voice dataset. Professional Slovak TTS model. | None | `ttsmodels/sk/cv/vits| | **vits** | sl | cv | tts_models | Slovenian VITS model trained on Common Voice dataset. High-quality Slovenian speech synthesis. | None |ttsmodels/sl/cv/vits` | | vits | sv | cv | ttsmodels | Swedish VITS model trained on Common Voice dataset. Advanced Swedish TTS model. | None | tts_models/sv/cv/vits | | glow-tts | it | maifemale | ttsmodels | Italian Glow-TTS model with female voice as explained in https://github.com/coqui-ai/TTS/issues/1148. Female Italian TTS model with flow-based architecture. | None | tts_models/it/mai_female/glow-tts | | vits | it | maifemale | ttsmodels | Italian VITS model with female voice as explained in https://github.com/coqui-ai/TTS/issues/1148. High-quality female Italian speech synthesis. | None | tts_models/it/mai_female/vits | | glow-tts | it | maimale | ttsmodels | Italian Glow-TTS model with male voice as explained in https://github.com/coqui-ai/TTS/issues/1148. Male Italian TTS model with flow-based architecture. | None | tts_models/it/mai_male/glow-tts | | vits | it | maimale | ttsmodels | Italian VITS model with male voice as explained in https://github.com/coqui-ai/TTS/issues/1148. High-quality male Italian speech synthesis. | None | tts_models/it/mai_male/vits | | glow-tts | tr | common-voice | ttsmodels | Turkish GlowTTS model using an unknown speaker from the Common-Voice dataset. High-quality Turkish speech synthesis with flow-based architecture. | `vocodermodels/tr/common-voice/hifigan|ttsmodels/tr/common-voice/glow-tts` | | glow-tts | be | common-voice | ttsmodels | Belarusian GlowTTS model created by @alex73 (Github). Community-contributed Belarusian TTS model with flow-based architecture. | vocoder_models/be/common-voice/hifigan | tts_models/be/common-voice/glow-tts |

Asian Language Models

| Model Name | Language | Dataset | Model Type | Description | Default Vocoder | Model Path | |------------|----------|---------|------------|-------------|-----------------|------------| | tacotron2-DDC-GST | zh-CN | baker | ttsmodels | Chinese Tacotron2 with Double Decoder Consistency and Global Style Tokens trained on Baker dataset. Advanced Chinese TTS model with style control capabilities. | None | `ttsmodels/zh-CN/baker/tacotron2-DDC-GST| | **tacotron2-DDC** | ja | kokoro | tts_models | Tacotron2 with Double Decoder Consistency trained with Kokoro Speech Dataset. High-quality Japanese TTS model with emotional speech capabilities. |vocodermodels/ja/kokoro/hifiganv1|tts_models/ja/kokoro/tacotron2-DDC` |

African Language Models

| Model Name | Language | Dataset | Model Type | Description | Default Vocoder | Model Path | |------------|----------|---------|------------|-------------|-----------------|------------| | vits | ewe | openbible | ttsmodels | Ewe VITS model trained on OpenBible dataset. Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible. Religious text-based TTS model for Ewe language. | None | `ttsmodels/ewe/openbible/vits| | **vits** | hau | openbible | tts_models | Hausa VITS model trained on OpenBible dataset. Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible. Religious text-based TTS model for Hausa language. | None |ttsmodels/hau/openbible/vits` | | vits | lin | openbible | ttsmodels | Lingala VITS model trained on OpenBible dataset. Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible. Religious text-based TTS model for Lingala language. | None | tts_models/lin/openbible/vits | | vits | twakuapem | openbible | ttsmodels | Twi (Akuapem) VITS model trained on OpenBible dataset. Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible. Religious text-based TTS model for Twi Akuapem dialect. | None | tts_models/tw_akuapem/openbible/vits | | vits | twasante | openbible | ttsmodels | Twi (Asante) VITS model trained on OpenBible dataset. Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible. Religious text-based TTS model for Twi Asante dialect. | None | tts_models/tw_asante/openbible/vits | | vits | yor | openbible | ttsmodels | Yoruba VITS model trained on OpenBible dataset. Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible. Religious text-based TTS model for Yoruba language. | None | `ttsmodels/yor/openbible/vits` |

Custom Language Models

| Model Name | Language | Dataset | Model Type | Description | Default Vocoder | Model Path | |------------|----------|---------|------------|-------------|-----------------|------------| | vits | ca | custom | ttsmodels | Catalan VITS model trained from zero with 101,460 utterances consisting of 257 speakers, approximately 138 hours of speech. Uses three datasets: Festcat, Google Catalan TTS, and Common Voice 8. Trained with TTS v0.8.0. More details at https://github.com/coqui-ai/TTS/discussions/930#discussioncomment-4466345 | None | `ttsmodels/ca/custom/vits| | **glow-tts** | fa | custom | tts_models | Persian TTS female Glow-TTS model for text-to-speech purposes. Single-speaker female voice trained on persian-tts-dataset-female. Note: This model has no compatible vocoder, thus output quality may not be optimal. Dataset available at https://www.kaggle.com/datasets/magnoliasis/persian-tts-dataset-famale | None |ttsmodels/fa/custom/glow-tts` | | vits-male | bn | custom | ttsmodels | Single speaker Bangla male VITS model. Comprehensive Bangla TTS model for male voice synthesis. For more information visit https://github.com/mobassir94/comprehensive-bangla-tts | None | tts_models/bn/custom/vits-male | | vits-female | bn | custom | ttsmodels | Single speaker Bangla female VITS model. Comprehensive Bangla TTS model for female voice synthesis. For more information visit https://github.com/mobassir94/comprehensive-bangla-tts | None | `ttsmodels/bn/custom/vits-female` |

Voice Conversion Models

| Model Name | Language | Dataset | Model Type | Description | Default Vocoder | Model Path | |------------|----------|---------|------------|-------------|-----------------|------------| | freevc24 | multilingual | vctk | voiceconversionmodels | FreeVC model trained on VCTK dataset from https://github.com/OlaWod/FreeVC. Advanced voice conversion model capable of converting voice characteristics while preserving linguistic content across multiple languages. | None | voice_conversion_models/multilingual/vctk/freevc24 |

Vocoder Models

Universal Vocoders

| Model Name | Language | Dataset | Model Type | Description | Model Path | |------------|----------|---------|------------|-------------|------------| | wavegrad | universal | libri-tts | vocodermodels | Universal WaveGrad vocoder trained on LibriTTS dataset. High-quality neural vocoder for converting mel-spectrograms to audio waveforms. | `vocodermodels/universal/libri-tts/wavegrad| | **fullband-melgan** | universal | libri-tts | vocoder_models | Universal Fullband MelGAN vocoder trained on LibriTTS dataset. Advanced generative vocoder for high-fidelity audio synthesis with full frequency band coverage. |vocoder_models/universal/libri-tts/fullband-melgan` |

English Vocoders

| Model Name | Language | Dataset | Model Type | Description | Model Path | |------------|----------|---------|------------|-------------|------------| | wavegrad | en | ek1 | vocodermodels | EK1 English (Received Pronunciation) WaveGrad vocoder by NMStoker. Specialized vocoder for British English accent synthesis. | `vocodermodels/en/ek1/wavegrad| | **multiband-melgan** | en | ljspeech | vocoder_models | Multi-band MelGAN vocoder trained on LJSpeech dataset. Efficient vocoder that processes multiple frequency bands simultaneously for faster inference. |vocodermodels/en/ljspeech/multiband-melgan` | | **hifiganv2** | en | ljspeech | vocodermodels | HiFiGAN v2 LJSpeech vocoder from https://arxiv.org/abs/2010.05646. State-of-the-art generative adversarial network-based vocoder for high-quality audio generation. | `vocodermodels/en/ljspeech/hifiganv2` | | univnet | en | ljspeech | vocodermodels | UnivNet model fine-tuned on TacotronDDCph spectrograms for better compatibility. Universal neural vocoder optimized for phoneme-based TTS models. | `vocodermodels/en/ljspeech/univnet| | **hifigan_v2** | en | blizzard2013 | vocoder_models | HiFiGAN v2 vocoder adapted for Blizzard2013 dataset from https://arxiv.org/abs/2010.05646. Professional-grade vocoder for high-quality speech synthesis. |vocodermodels/en/blizzard2013/hifiganv2| | **hifigan_v2** | en | vctk | vocoder_models | HiFiGAN v2 fine-tuned for VCTK dataset, intended for use with tts_models/en/vctk/sc-glow-tts. Multi-speaker vocoder supporting various English accents. |vocodermodels/en/vctk/hifiganv2| | **hifigan_v2** | en | sam | vocoder_models | HiFiGAN v2 fine-tuned for SAM dataset, intended for use with tts_models/en/sam/tacotron_DDC. Corporate-grade vocoder for professional speech synthesis. |vocodermodels/en/sam/hifiganv2` |

European Language Vocoders

| Model Name | Language | Dataset | Model Type | Description | Model Path | |------------|----------|---------|------------|-------------|------------| | parallel-wavegan | nl | mai | vocodermodels | Parallel WaveGAN vocoder for Dutch language trained on MAI dataset. High-quality Dutch speech synthesis vocoder with parallel processing capabilities. | `vocodermodels/nl/mai/parallel-wavegan| | **wavegrad** | de | thorsten | vocoder_models | WaveGrad vocoder for German language trained on Thorsten dataset. Diffusion-based vocoder for high-quality German speech synthesis. |vocodermodels/de/thorsten/wavegrad` | | fullband-melgan | de | thorsten | vocodermodels | Fullband MelGAN vocoder for German language trained on Thorsten dataset. Advanced German vocoder with full frequency band coverage. | vocoder_models/de/thorsten/fullband-melgan | | hifigan_v1 | de | thorsten | vocodermodels | HiFiGAN v1 vocoder for Thorsten Neutral Dec2021 22k sample rate Tacotron2 DDC model. Specialized German vocoder optimized for the Thorsten dataset. | `vocodermodels/de/thorsten/hifiganv1` | | multiband-melgan | uk | mai | vocodermodels | Multi-band MelGAN vocoder for Ukrainian language trained on MAI dataset. Ukrainian speech synthesis vocoder with multi-band processing. | vocoder_models/uk/mai/multiband-melgan | | hifigan | tr | common-voice | vocodermodels | HiFiGAN vocoder for Turkish language using an unknown speaker from the Common-Voice dataset. High-quality Turkish speech synthesis vocoder. | `vocodermodels/tr/common-voice/hifigan| | **hifigan** | be | common-voice | vocoder_models | Belarusian HiFiGAN vocoder created by @alex73 (Github). Community-contributed Belarusian speech synthesis vocoder. |vocoder_models/be/common-voice/hifigan` |

Asian Language Vocoders

| Model Name | Language | Dataset | Model Type | Description | Model Path | |------------|----------|---------|------------|-------------|------------| | hifigan_v1 | ja | kokoro | vocodermodels | HiFiGAN v1 vocoder for Japanese language trained on Kokoro dataset by @kaiidams. High-quality Japanese speech synthesis vocoder with emotional speech capabilities. | `vocodermodels/ja/kokoro/hifigan_v1` |

Summary Statistics

Total TTS Models: 72 models
Total Voice Conversion Models: 1 model
Total Vocoder Models: 18 models
Languages Supported: 40+ languages including multilingual models
Architecture Types: VITS, Tacotron2, Glow-TTS, FastPitch, Bark, XTTS, and more
Key Features: Cross-language voice cloning, multi-speaker support, emotional speech synthesis, and professional-grade quality

Usage Notes

Model Paths: Use the exact model paths provided in the tables when loading models
Vocoder Compatibility: Some models require specific vocoders for optimal performance
Language Support: Multilingual models support multiple languages in a single model
Quality Levels: Models vary from research-grade to production-ready quality
Licensing: Some models have specific licensing requirements (e.g., Jenny model)
Community Contributions: Several models are contributed by the community (indicated by contributor names)

🐸Coqui.ai News

📣 ⓍTTSv2 is here with 16 languages and better performance across the board.
📣 ⓍTTS fine-tuning code is out. Check the example recipes.
📣 ⓍTTS can now stream with <200ms latency.
📣 ⓍTTS, our production TTS model that can speak 13 languages, is released Blog Post, Demo, Docs
📣 🐶Bark is now available for inference with unconstrained voice cloning. Docs
📣 You can use ~1100 Fairseq models with 🐸TTS.
📣 🐸TTS now supports 🐢Tortoise with faster inference. Docs

**🐸TTS is a library for advanced Text-to-Speech generation.** 🚀 Pretrained models in +1100 languages. 🛠️ Tools for training new models and fine-tuning existing models in any language. 📚 Utilities for dataset analysis and curation. ______________________________________________________________________ [![Discord](https://img.shields.io/discord/1037326658807533628?color=%239B59B6&label=chat%20on%20discord)](https://discord.gg/5eXr5seRrv) [![License]()](https://opensource.org/licenses/MPL-2.0) [![PyPI version](https://badge.fury.io/py/TTS.svg)](https://badge.fury.io/py/TTS) [![Covenant](https://camo.githubusercontent.com/7d620efaa3eac1c5b060ece5d6aacfcc8b81a74a04d05cd0398689c01c4463bb/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f436f6e7472696275746f72253230436f76656e616e742d76322e3025323061646f707465642d6666363962342e737667)](https://github.com/coqui-ai/TTS/blob/master/CODE_OF_CONDUCT.md) [![Downloads](https://pepy.tech/badge/tts)](https://pepy.tech/project/tts) [![DOI](https://zenodo.org/badge/265612440.svg)](https://zenodo.org/badge/latestdoi/265612440) ![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/aux_tests.yml/badge.svg) ![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/data_tests.yml/badge.svg) ![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/docker.yaml/badge.svg) ![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/inference_tests.yml/badge.svg) ![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/style_check.yml/badge.svg) ![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/text_tests.yml/badge.svg) ![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/tts_tests.yml/badge.svg) ![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/vocoder_tests.yml/badge.svg) ![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/zoo_tests0.yml/badge.svg) ![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/zoo_tests1.yml/badge.svg) ![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/zoo_tests2.yml/badge.svg) [![Docs]()](https://tts.readthedocs.io/en/latest/)

💬 Where to ask questions

Please use our dedicated channels for questions and discussion. Help is much more valuable if it's shared publicly so that more people can benefit from it.

| Type | Platforms | | ------------------------------- | --------------------------------------- | | 🚨 Bug Reports | GitHub Issue Tracker | | 🎁 Feature Requests & Ideas | GitHub Issue Tracker | | 👩‍💻 Usage Questions | GitHub Discussions | | 🗯 General Discussion | GitHub Discussions or Discord |

🔗 Links and Resources

| Type | Links | | ------------------------------- | --------------------------------------- | | 💼 Documentation | ReadTheDocs | 💾 Installation | TTS/README.md| | 👩‍💻 Contributing | CONTRIBUTING.md| | 📌 Road Map | Main Development Plans | 🚀 Released Models | TTS Releases and Experimental Models| | 📰 Papers | TTS Papers|

🥇 TTS Performance

Underlined "TTS" and "Judy" are internal 🐸TTS models that are not released open-source. They are here to show the potential. Models prefixed with a dot (.Jofish .Abe and .Janice) are real human voices.

Features

High-performance Deep Learning models for Text2Speech tasks.
- Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech).
- Speaker Encoder to compute speaker embeddings efficiently.
- Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN)
Fast and efficient model training.
Detailed training logs on the terminal and Tensorboard.
Support for Multi-speaker TTS.
Efficient, flexible, lightweight but feature complete Trainer API.
Released and ready-to-use models.
Tools to curate Text2Speech datasets underdataset_analysis.
Utilities to use and test your models.
Modular (but not too much) code base enabling easy implementation of new ideas.

Model Implementations

Spectrogram models

Tacotron: paper
Tacotron2: paper
Glow-TTS: paper
Speedy-Speech: paper
Align-TTS: paper
FastPitch: paper
FastSpeech: paper
FastSpeech2: paper
SC-GlowTTS: paper
Capacitron: paper
OverFlow: paper
Neural HMM TTS: paper
Delightful TTS: paper

End-to-End Models

ⓍTTS: blog
VITS: paper
🐸 YourTTS: paper
🐢 Tortoise: orig. repo
🐶 Bark: orig. repo

Attention Methods

Guided Attention: paper
Forward Backward Decoding: paper
Graves Attention: paper
Double Decoder Consistency: blog
Dynamic Convolutional Attention: paper
Alignment Network: paper

Speaker Encoder

GE2E: paper
Angular Loss: paper

Vocoders

MelGAN: paper
MultiBandMelGAN: paper
ParallelWaveGAN: paper
GAN-TTS discriminators: paper
WaveRNN: origin
WaveGrad: paper
HiFiGAN: paper
UnivNet: paper

Voice Conversion

FreeVC: paper

You can also help us implement more models.

Installation

🐸TTS is tested on Ubuntu 18.04 with python >= 3.9, < 3.12..

If you are only interested in synthesizing speech with the released 🐸TTS models, installing from PyPI is the easiest option.

bash pip install TTS

If you plan to code or train models, clone 🐸TTS and install it locally.

bash git clone https://github.com/coqui-ai/TTS pip install -e .[all,dev,notebooks] # Select the relevant extras

If you are on Ubuntu (Debian), you can also run following commands for installation.

bash $ make system-deps # intended to be used on Ubuntu (Debian). Let us know if you have a different OS. $ make install

If you are on Windows, 👑@GuyPaddock wrote installation instructions here.

Docker Image

You can also try TTS without install with the docker image. Simply run the following command and you will be able to run TTS without installing it.

bash docker run --rm -it -p 5002:5002 --entrypoint /bin/bash ghcr.io/coqui-ai/tts-cpu python3 TTS/server/server.py --list_models #To get the list of available models python3 TTS/server/server.py --model_name tts_models/en/vctk/vits # To start a server

You can then enjoy the TTS server here More details about the docker images (like GPU support) can be found here

Synthesizing speech by 🐸TTS

🐍 Python API

Running a multi-speaker and multi-lingual model

```python import torch from TTS.api import TTS

Get device

device = "cuda" if torch.cuda.is_available() else "cpu"

List available 🐸TTS models

print(TTS().list_models())

Init TTS

tts = TTS("ttsmodels/multilingual/multi-dataset/xttsv2").to(device)

Run TTS

❗ Since this model is multi-lingual voice cloning model, we must set the target speaker_wav and language

Text to speech list of amplitude values as output

wav = tts.tts(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en")

Text to speech to a file

tts.ttstofile(text="Hello world!", speakerwav="my/cloning/audio.wav", language="en", filepath="output.wav") ```

Running a single speaker model

```python

Init TTS with the target model name

tts = TTS(modelname="ttsmodels/de/thorsten/tacotron2-DDC", progress_bar=False).to(device)

Run TTS

tts.ttstofile(text="Ich bin eine Testnachricht.", filepath=OUTPUTPATH)

Example voice cloning with YourTTS in English, French and Portuguese

tts = TTS(modelname="ttsmodels/multilingual/multi-dataset/yourtts", progressbar=False).to(device) tts.ttstofile("This is voice cloning.", speakerwav="my/cloning/audio.wav", language="en", filepath="output.wav") tts.ttstofile("C'est le clonage de la voix.", speakerwav="my/cloning/audio.wav", language="fr-fr", filepath="output.wav") tts.ttstofile("Isso é clonagem de voz.", speakerwav="my/cloning/audio.wav", language="pt-br", filepath="output.wav") ```

Example voice conversion

Converting the voice in source_wav to the voice of target_wav

python tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False).to("cuda") tts.voice_conversion_to_file(source_wav="my/source.wav", target_wav="my/target.wav", file_path="output.wav")

Example voice cloning together with the voice conversion model.

This way, you can clone voices by using any model in 🐸TTS.

```python

tts = TTS("ttsmodels/de/thorsten/tacotron2-DDC") tts.ttswithvctofile( "Wie sage ich auf Italienisch, dass ich dich liebe?", speakerwav="target/speaker.wav", file_path="output.wav" ) ```

Example text to speech using Fairseq models in ~1100 languages 🤯.

For Fairseq models, use the following name format: tts_models/<lang-iso_code>/fairseq/vits. You can find the language ISO codes here and learn about the Fairseq models here.

```python

TTS with on the fly voice conversion

api = TTS("ttsmodels/deu/fairseq/vits") api.ttswithvctofile( "Wie sage ich auf Italienisch, dass ich dich liebe?", speakerwav="target/speaker.wav", file_path="output.wav" ) ```

Command-line `tts`

Synthesize speech on command line.

You can either use your trained model or choose a model from the provided list.

If you don't specify any models, then it uses LJSpeech based English model.

Single Speaker Models

List provided models:

$ tts --list_models

Get model info (for both ttsmodels and vocodermodels):
- Query by type/name: The modelinfobyname uses the name as it from the --listmodels. $ tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>" For example: $ tts --model_info_by_name tts_models/tr/common-voice/glow-tts $ tts --model_info_by_name vocoder_models/en/ljspeech/hifigan_v2
- Query by type/idx: The modelqueryidx uses the corresponding idx from --list_models.
$ tts --model_info_by_idx "<model_type>/<model_query_idx>"

For example:

$ tts --model_info_by_idx tts_models/3 - Query info for model info by full name: $ tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"
Run TTS with default models:

$ tts --text "Text for TTS" --out_path output/path/speech.wav

Run TTS and pipe out the generated TTS wav file data:

$ tts --text "Text for TTS" --pipe_out --out_path output/path/speech.wav | aplay

Run a TTS model with its default vocoder model:

$ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav

For example:

$ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --out_path output/path/speech.wav

Run with specific TTS and vocoder models from the list:

$ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --vocoder_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav

For example:

$ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --vocoder_name "vocoder_models/en/ljspeech/univnet" --out_path output/path/speech.wav

Run your own TTS model (Using Griffin-Lim Vocoder):

$ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav

Run your own TTS and Vocoder models:

$ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav --vocoder_path path/to/vocoder.pth --vocoder_config_path path/to/vocoder_config.json

Multi-speaker Models

List the available speakers and choose a among them:

$ tts --model_name "<language>/<dataset>/<model_name>" --list_speaker_idxs

Run the multi-speaker TTS model with the target speaker ID:

$ tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" --speaker_idx <speaker_id>

Run your own multi-speaker TTS model:

$ tts --text "Text for TTS" --out_path output/path/speech.wav --model_path path/to/model.pth --config_path path/to/config.json --speakers_file_path path/to/speaker.json --speaker_idx <speaker_id>

Voice Conversion Models

$ tts --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" --source_wav <path/to/speaker/wav> --target_wav <path/to/reference/wav>

Directory Structure

Owner

Name: Muhammad Nabeel Khan
Login: softengrmuhammadnabeel
Kind: user

Repositories: 1
Profile: https://github.com/softengrmuhammadnabeel

I'm a front-end dev that's great at CSS and someday-going-to-be-great at JavaScript .I am ready to indicate of some issues in the front-end industry.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you want to cite 🐸💬, feel free to use this (but only if you loved it 😊)"
title: "Coqui TTS"
abstract: "A deep learning toolkit for Text-to-Speech, battle-tested in research and production"
date-released: 2021-01-01
authors:
  - family-names: "Eren"
    given-names: "Gölge"
  - name: "The Coqui TTS Team"
version: 1.4
doi: 10.5281/zenodo.6334862
license: "MPL-2.0"
url: "https://www.coqui.ai"
repository-code: "https://github.com/coqui-ai/TTS"
keywords:
  - machine learning
  - deep learning
  - artificial intelligence
  - text to speech
  - TTS

GitHub Events

Total

Public event: 1
Push event: 3

Last Year

Public event: 1
Push event: 3

Dependencies

.github/workflows/aux_tests.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/data_tests.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/docker.yaml actions

actions/checkout v2 composite
docker/build-push-action v2 composite
docker/login-action v1 composite
docker/setup-buildx-action v1 composite
docker/setup-qemu-action v1 composite

.github/workflows/inference_tests.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/pypi-release.yml actions

actions/checkout v3 composite
actions/download-artifact v2 composite
actions/setup-python v2 composite
actions/upload-artifact v2 composite

.github/workflows/style_check.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/text_tests.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/tts_tests.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/tts_tests2.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/vocoder_tests.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/xtts_tests.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/zoo_tests0.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/zoo_tests1.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/zoo_tests2.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

Dockerfile docker

${BASE} latest build

recipes/bel-alex73/docker-prepare/Dockerfile docker

ubuntu 22.04 build

docs/requirements.txt pypi

furo *
linkify-it-py *
myst-parser ==2.0.0
sphinx ==7.2.5
sphinx_copybutton *
sphinx_inline_tabs *

pyproject.toml pypi

requirements.dev.txt pypi

black * development
coverage * development
isort * development
nose2 * development
pylint ==2.10.2 development

requirements.ja.txt pypi

cutlet *
mecab-python3 ==1.0.6
unidic-lite ==1.0.8

requirements.notebooks.txt pypi

bokeh ==1.4.0

requirements.txt pypi

aiohttp >=3.8.1
anyascii >=0.3.0
bangla *
bnnumerizer *
bnunicodenormalizer *
coqpit >=0.0.16
cython >=0.29.30
einops >=0.6.0
encodec >=0.1.1
flask >=2.0.1
fsspec >=2023.6.0
g2pkk >=0.1.1
gruut ==2.2.3
hangul_romanize *
inflect >=5.6.0
jamo *
jieba *
librosa >=0.10.0
matplotlib >=3.7.0
mutagen ==1.47.0
nltk *
num2words *
numba >=0.57.0
numba ==0.55.1
numpy >=1.24.3
numpy ==1.22.0
packaging >=23.1
pandas >=1.4,<2.0
pypinyin *
pysbd >=0.3.4
pyyaml >=6.0
scikit-learn >=1.3.0
scipy >=1.11.2
soundfile >=0.12.0
spacy >=3
torch >=2.1
torchaudio *
tqdm >=4.64.1
trainer >=0.0.36
transformers >=4.33.0
umap-learn >=0.5.1
unidecode >=1.3.2

setup.py pypi

my_tts_engine

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

TTS Models Comprehensive Database

Text-to-Speech (TTS) Models

Multilingual Models

English Models

European Language Models

Asian Language Models

African Language Models

Custom Language Models

Voice Conversion Models

Vocoder Models

Universal Vocoders

English Vocoders

European Language Vocoders

Asian Language Vocoders

Summary Statistics

Usage Notes

🐸Coqui.ai News

💬 Where to ask questions

🔗 Links and Resources

🥇 TTS Performance

Features

Model Implementations

Spectrogram models

End-to-End Models

Attention Methods

Speaker Encoder

Vocoders

Voice Conversion

Installation

Docker Image

Synthesizing speech by 🐸TTS

🐍 Python API

Running a multi-speaker and multi-lingual model

Get device

List available 🐸TTS models

Init TTS

Run TTS

❗ Since this model is multi-lingual voice cloning model, we must set the target speaker_wav and language

Text to speech list of amplitude values as output

Text to speech to a file

Running a single speaker model

Init TTS with the target model name

Run TTS

Example voice cloning with YourTTS in English, French and Portuguese

Example voice conversion

Example voice cloning together with the voice conversion model.

Example text to speech using Fairseq models in ~1100 languages 🤯.

TTS with on the fly voice conversion

Command-line tts

Single Speaker Models

Multi-speaker Models

Voice Conversion Models

Directory Structure

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies

Command-line `tts`