Projects | Open Source Science

Updated 11 months ago

tts • Rank 28.2 • Science 64%

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

deep-learning glow-tts hifigan melgan multi-speaker-tts python pytorch speaker-encoder speaker-encodings speech speech-synthesis tacotron text-to-speech tts tts-model vocoder voice-cloning voice-conversion voice-synthesis

Updated 10 months ago

mosec • Rank 20.4 • Science 54%

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

cv deep-learning gpu hacktoberfest jax llm llm-serving machine-learning machine-learning-platform mlops model-serving mxnet nerual-network python pytorch rust tensorflow tts

Updated 11 months ago

mean-opinion-score • Rank 6.9 • Science 67%

Python library for calculating the mean opinion score and 95% confidence interval of the standard deviation of text-to-speech ratings according to Ribeiro et al. (2011).

intelligibility mos naturalness speech-synthesis subjective-evaluation text-to-speech tts

Updated 11 months ago

pronunciation-dictionary-utils • Rank 4.9 • Science 67%

Utils to modify pronunciation dictionaries.

dictionary ipa speech-synthesis tts

Updated 11 months ago

mel-cepstral-distance • Rank 4.7 • Science 67%

A Python library for computing the Mel-Cepstral Distance (Mel-Cepstral Distortion, MCD) between two inputs. This implementation is based on the method proposed by Robert F. Kubichek in "Mel-Cepstral Distance Measure for Objective Speech Quality Assessment".

cepstral distance distortion divergence dtw dynamic-time-warping language linguistics mcd mel mfcc objective-evaluation spectrogram spectrum speech-quality speech-synthesis text-to-speech tts voice-cloning

Updated 11 months ago

zho-tts • Rank 1.9 • Science 67%

Web app, command-line interface and Python library for synthesizing Chinese texts into speech.

chinese-tts linguistics speech speech-synthesis tacotron tts waveglow

Updated 10 months ago

english-text-normalization • Rank 1.8 • Science 67%

Command-line interface (CLI) and library to normalize English texts.

nlp preprocessing text-normalization tts

Updated 10 months ago

https://github.com/myshell-ai/openvoice • Rank 13.0 • Science 46%

Instant voice cloning by MIT and MyShell. Audio foundation model.

text-to-speech tts voice-clone zero-shot-tts

Updated 11 months ago

vitsserver • Rank 5.6 • Science 44%

🌻 VITS ONNX TTS server designed for fast inference 🔥

onnx onnxruntime so-vits-svc tts tts-api vits

Updated 11 months ago

audio_common • Rank 4.2 • Science 44%

A PortAudio based audio_common with text to speech for ROS 2

audio espeak pyaudio ros2 text-to-speech tts

Updated 10 months ago

https://github.com/balisujohn/tortoise.cpp • Rank 9.7 • Science 13%

A ggml (C++) re-implementation of tortoise-tts

ggml local speech text text-to-speech to tortoise-tts tts

Updated 11 months ago

diffgan-tts • Science 36%

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

ddpm deep-neural-networks diffgan-tts diffspeech diffusion diffusion-models fastspeech gan generative-model hifi-gan multi-speaker-tts neural-tts non-ar non-autoregressive pytorch single-speaker-tts speech-synthesis text-to-speech tts

Updated 10 months ago

comprehensive-transformer-tts • Science 36%

A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS

comprehensive deep-learning fastspeech fastspeech2 hifi-gan mel-gan multi-speaker neural-tts non-ar non-autoregressive pytorch single-speaker sota speech-synthesis supervised text-to-speech transformer tts ultimate-tts unsupervised

Updated 11 months ago

en-tts • Science 67%

Command-line interface and Python library for synthesizing English texts into speech.

english speech speech-synthesis tacotron text-to-speech tts waveglow

Updated 10 months ago

https://github.com/coqui-ai/tts-papers • Science 10%

🐸 collection of TTS papers

coqui-ai deep-learning papers research-paper speech tts

Updated 10 months ago

https://github.com/coqui-ai/open-speech-corpora • Science 23%

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

speech-emotion-recognition speech-processing speech-recognition speech-separation speech-synthesis speech-to-text stt text-to-speech tts voice-activity-detection voice-cloning voice-recognition

Updated 11 months ago

pinyin-to-ipa • Science 67%

Command-line interface and Python library to transcribe pinyin to IPA. The tones are attached to the vowel of the syllable.

bopomofo chinese cyrillic international-phonetic-alphabet linguistics phonetics pinyin speech-recognition speech-synthesis transcription tts zhuyin

Updated 11 months ago

thorsten-voice • Science 67%

Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.

dataset deutsch german speech-synthesis sprachsynthese thorsten-voice tts

Updated 11 months ago

yoruba-text • Science 26%

Yorùbá language training text for NLP, ASR and TTS tasks

african-languages asr diacritization machine-translation natural-language-processing nlp nlp-datasets training-dataset tts yoruba

Updated 11 months ago

emospeech • Science 54%

tts

Updated 11 months ago

iraqi-dialect-tts-corpus • Science 67%

A comprehensive dataset for training a Text-to-Speech system focused on the Iraqi dialect. Contains custom-recorded audio samples, phonetic annotations, and text to support TTS model development and synthesis for Iraqi Arabic.

arabic-nlp iraqi-dialect text-to-speech tts

Updated 11 months ago

conrad • Science 36%

Client for the Microsoft Cognitive Services Text to Speech REST API (reboot of the mscstts package)

azure r text-to-speech tts

Updated 11 months ago

tts-multilingual • Science 44%

Text To Speech Multilingual Support (+20 Language)

conversational-ai text-to-speech tts tts-api

Updated 11 months ago

monotonic-alignment-search • Science 44%

Monotonically align text and speech

python speech speech-synthesis text-to-speech tts

Updated 11 months ago

comprehensive-e2e-tts • Science 54%

A Non-Autoregressive End-to-End Text-to-Speech (text-to-wav), supporting a family of SOTA unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate E2E-TTS

deep-learning end-to-end fastspeech2 hifi-gan jets multi-speaker neural-tts non-ar non-autoregressive pytorch single-speaker sota speech-synthesis text-to-speech text-to-wav tts ultimate-tts unsupervised

Updated 11 months ago

cross-speaker-emotion-transfer • Science 36%

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

conditional-layer-normalization cross-speaker deep-neural-networks emotion-transfer generative-model global-style-tokens neural-tts non-ar non-autoregressive parallel-tacotron pytorch semi-supervised-learning speech-synthesis text-to-speech tts

Updated 11 months ago

speech-recognition-uk • Science 44%

🇺🇦 Speech Recognition & Synthesis for Ukrainian

speech speech-recognition speech-synthesis speech-to-text text-to-speech tts ukrainian

Updated 11 months ago

tacotron-cli • Science 67%

Command-line interface to train Tacotron 2 using .wav <=> .TextGrid pairs.

linguistics speech speech-synthesis tts

Updated 11 months ago

NeMo • Science 44%

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

asr deeplearning generative-ai large-language-models machine-translation multimodal neural-networks speaker-diariazation speaker-recognition speech-synthesis speech-translation tts

Mathematics (40%)

Updated 11 months ago

cwq-public-tts • Science 36%

一款语音合成类的大模型，可以根据文本生成目标音色。支持0样本生成。语音大模型，tts，喜欢的点点star!!! PS：Coqu TTS复制的克隆下来的项目。做过二次开发，修改部分参数。

llm tts voice

Updated 11 months ago

tts-tortoise-gradio • Science 44%

A Gradio setup for Tortoise TTS.

speech speech-synthesis text-to-speech tortoise-tts tts

Updated 11 months ago

speech_data_ghana_ug • Science 44%

The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language and 100 hours of transcription.

data data-science ghana legon llm tts ug ugspeechdata

Updated 11 months ago

ttsceleb • Science 54%

A TTS app where you can clone the voices of any person you wish.

bark beats encodec natural-language-processing pytorch streamlit text-to-speech tts

Updated 11 months ago

speech_data_ghana_ug • Science 57%

The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language. Of which 100 hours is transcribed.

asr data data-science ghana legon llm ml tts ug ugspeechdata

Updated 11 months ago

dl-for-emo-tts • Science 67%

:computer: :robot: A summary on our attempts at using Deep Learning approaches for Emotional Text to Speech :speaker:

affective-computing dc-tts deep-learning emotional-tts lj-speech ravdess speech-synthesis tacotron tacotron-models tts

Updated 11 months ago

mgvt • Science 67%

Modular architecture for an AI-based 3D Virtual Teacher (Bachelor's thesis 2025)

ai architecture llm mgvt raef tts verbamanent virtual-teacher

Updated 11 months ago

nemo • Science 13%

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

asr deeplearning generative-ai large-langage-models machine-translation multimodal neural-networks speaker-diariazation speaker-recognition speech-synthesis speech-translation tts

Updated 11 months ago

radtts-uk • Science 67%

High-fidelity speech synthesis for Ukrainian using modern neural networks.

audio sound speech-uk synthesis text-to-speech tts ukrainian vocos wav wave