Updated 9 months ago

mosec • Rank 20.4 • Science 54%

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

Updated 9 months ago

mean-opinion-score • Rank 6.9 • Science 67%

Python library for calculating the mean opinion score and 95% confidence interval of the standard deviation of text-to-speech ratings according to Ribeiro et al. (2011).

Updated 9 months ago

pronunciation-dictionary-utils • Rank 4.9 • Science 67%

Utils to modify pronunciation dictionaries.

Updated 9 months ago

mel-cepstral-distance • Rank 4.7 • Science 67%

A Python library for computing the Mel-Cepstral Distance (Mel-Cepstral Distortion, MCD) between two inputs. This implementation is based on the method proposed by Robert F. Kubichek in "Mel-Cepstral Distance Measure for Objective Speech Quality Assessment".

Updated 9 months ago

zho-tts • Rank 1.9 • Science 67%

Web app, command-line interface and Python library for synthesizing Chinese texts into speech.

Updated 9 months ago

english-text-normalization • Rank 1.8 • Science 67%

Command-line interface (CLI) and library to normalize English texts.

Updated 9 months ago

https://github.com/myshell-ai/openvoice • Rank 13.0 • Science 46%

Instant voice cloning by MIT and MyShell. Audio foundation model.

Updated 9 months ago

vitsserver • Rank 5.6 • Science 44%

🌻 VITS ONNX TTS server designed for fast inference 🔥

Updated 9 months ago

audio_common • Rank 4.2 • Science 44%

A PortAudio based audio_common with text to speech for ROS 2

Updated 9 months ago

cwq-public-tts • Science 36%

一款语音合成类的大模型,可以根据文本生成目标音色。支持0样本生成。语音大模型,tts,喜欢的点点star!!! PS:Coqu TTS复制的克隆下来的项目。做过二次开发,修改部分参数。

Updated 9 months ago

thorsten-voice • Science 67%

Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.

Updated 9 months ago

emospeech • Science 54%

tts
Updated 9 months ago

iraqi-dialect-tts-corpus • Science 67%

A comprehensive dataset for training a Text-to-Speech system focused on the Iraqi dialect. Contains custom-recorded audio samples, phonetic annotations, and text to support TTS model development and synthesis for Iraqi Arabic.

Updated 9 months ago

en-tts • Science 67%

Command-line interface and Python library for synthesizing English texts into speech.

Updated 9 months ago

pinyin-to-ipa • Science 67%

Command-line interface and Python library to transcribe pinyin to IPA. The tones are attached to the vowel of the syllable.

Updated 9 months ago

nemo • Science 13%

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Updated 9 months ago

radtts-uk • Science 67%

High-fidelity speech synthesis for Ukrainian using modern neural networks.

Updated 9 months ago

dl-for-emo-tts • Science 67%

:computer: :robot: A summary on our attempts at using Deep Learning approaches for Emotional Text to Speech :speaker:

Updated 9 months ago

ttsceleb • Science 54%

A TTS app where you can clone the voices of any person you wish.

Updated 9 months ago

speech_data_ghana_ug • Science 57%

The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language. Of which 100 hours is transcribed.

Updated 9 months ago

mgvt • Science 67%

Modular architecture for an AI-based 3D Virtual Teacher (Bachelor's thesis 2025)

Updated 9 months ago

NeMo • Science 44%

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Updated 9 months ago

conrad • Science 36%

Client for the Microsoft Cognitive Services Text to Speech REST API (reboot of the mscstts package)

Updated 9 months ago

tts-multilingual • Science 44%

Text To Speech Multilingual Support (+20 Language)

Updated 9 months ago

speech-recognition-uk • Science 44%

🇺🇦 Speech Recognition & Synthesis for Ukrainian

Updated 9 months ago

tacotron-cli • Science 67%

Command-line interface to train Tacotron 2 using .wav <=> .TextGrid pairs.

Updated 9 months ago

tts-tortoise-gradio • Science 44%

A Gradio setup for Tortoise TTS.

Updated 9 months ago

speech_data_ghana_ug • Science 44%

The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language and 100 hours of transcription.

Updated 9 months ago

monotonic-alignment-search • Science 44%

Monotonically align text and speech

Updated 9 months ago

comprehensive-e2e-tts • Science 54%

A Non-Autoregressive End-to-End Text-to-Speech (text-to-wav), supporting a family of SOTA unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate E2E-TTS

Updated 9 months ago

comprehensive-transformer-tts • Science 36%

A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS

Updated 9 months ago

cross-speaker-emotion-transfer • Science 36%

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech