Nkululeko 1.0: A Python package to predict speaker characteristics with a high-level interface
Nkululeko 1.0: A Python package to predict speaker characteristics with a high-level interface - Published in JOSS (2025)
audiomate
audiomate: A Python package for working with audio datasets - Published in JOSS (2020)
pygamma-agreement
pygamma-agreement: Gamma γ measure for inter/intra-annotator agreement in Python - Published in JOSS (2021)
datasets
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
gladia-torchaudio
Data manipulation and transformation for audio signal processing, powered by PyTorch
tts
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
goodness-of-pronunciation-pipelines-for-oov-problem
Goodness of Pronunciation Pipelines for OOV Removal
zho-tts
Web app, command-line interface and Python library for synthesizing Chinese texts into speech.
grounded-segment-anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
https://github.com/fgnt/padertorch
A collection of common functionality to simplify the design, training and evaluation of machine learning models based on pytorch with an emphasis on speech processing.
https://github.com/fgnt/paderbox
Paderbox: A collection of utilities for audio / speech processing
huggingsound
HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools
speechllm
This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.
https://github.com/bytedance/salmonn
SALMONN family: A suite of advanced multi-modal LLMs
https://github.com/balisujohn/tortoise.cpp
A ggml (C++) re-implementation of tortoise-tts
SpecAugment
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
tacotron-cli
Command-line interface to train Tacotron 2 using .wav <=> .TextGrid pairs.
podcastmix
PodcastMix A dataset for separating music and speech in podcasts. Code of my Master Thesis in the Sound and Music Computing Master Program Universitat Pompeu Fabra
en-tts
Command-line interface and Python library for synthesizing English texts into speech.
https://github.com/audiollms/audiobench
AudioBench: A Universal Benchmark for Audio Large Language Models
cv10-uk-testset-clean
The cleaned Common Voice 10 (test set) that has been checked by a human for Ukrainian 🇺🇦
stipa
MATLAB implementation of the Speech Transmission Index for Public Address (STIPA) method for evaluating the speech transmission quality.
asaca-automatic-speech-analysis-for-cognitive-assessment
Transform speech into cognitive assessments with ASACA. Achieve accurate predictions and low error rates using our end-to-end toolkit. 🚀🔧
speech-utility-bioacoustics
On the utility of speech and audio foundation models for marmoset call analysis
asaca-automatic-speech-analysis-for-cognitive-assessment
The automatic system that can extract PRAAT-like speech features from raw speech wav files, and also can get low WER (<10) high quality transcriptions at the same time.