tts
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
mean-opinion-score
Python library for calculating the mean opinion score and 95% confidence interval of the standard deviation of text-to-speech ratings according to Ribeiro et al. (2011).
mel-cepstral-distance
A Python library for computing the Mel-Cepstral Distance (Mel-Cepstral Distortion, MCD) between two inputs. This implementation is based on the method proposed by Robert F. Kubichek in "Mel-Cepstral Distance Measure for Objective Speech Quality Assessment".
zho-tts
Web app, command-line interface and Python library for synthesizing Chinese texts into speech.
english-text-normalization
Command-line interface (CLI) and library to normalize English texts.
https://github.com/myshell-ai/openvoice
Instant voice cloning by MIT and MyShell. Audio foundation model.
https://github.com/balisujohn/tortoise.cpp
A ggml (C++) re-implementation of tortoise-tts
cwq-public-tts
一款语音合成类的大模型,可以根据文本生成目标音色。支持0样本生成。语音大模型,tts,喜欢的点点star!!! PS:Coqu TTS复制的克隆下来的项目。做过二次开发,修改部分参数。
thorsten-voice
Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.
iraqi-dialect-tts-corpus
A comprehensive dataset for training a Text-to-Speech system focused on the Iraqi dialect. Contains custom-recorded audio samples, phonetic annotations, and text to support TTS model development and synthesis for Iraqi Arabic.
en-tts
Command-line interface and Python library for synthesizing English texts into speech.
pinyin-to-ipa
Command-line interface and Python library to transcribe pinyin to IPA. The tones are attached to the vowel of the syllable.
nemo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
dl-for-emo-tts
:computer: :robot: A summary on our attempts at using Deep Learning approaches for Emotional Text to Speech :speaker:
speech_data_ghana_ug
The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language. Of which 100 hours is transcribed.
mgvt
Modular architecture for an AI-based 3D Virtual Teacher (Bachelor's thesis 2025)
https://github.com/coqui-ai/open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
conrad
Client for the Microsoft Cognitive Services Text to Speech REST API (reboot of the mscstts package)
tacotron-cli
Command-line interface to train Tacotron 2 using .wav <=> .TextGrid pairs.
speech_data_ghana_ug
The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language and 100 hours of transcription.
diffgan-tts
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
comprehensive-e2e-tts
A Non-Autoregressive End-to-End Text-to-Speech (text-to-wav), supporting a family of SOTA unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate E2E-TTS
comprehensive-transformer-tts
A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS
cross-speaker-emotion-transfer
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech