Nkululeko 1.0: A Python package to predict speaker characteristics with a high-level interface
Nkululeko 1.0: A Python package to predict speaker characteristics with a high-level interface - Published in JOSS (2025)
audiomate
audiomate: A Python package for working with audio datasets - Published in JOSS (2020)
pygamma-agreement
pygamma-agreement: Gamma γ measure for inter/intra-annotator agreement in Python - Published in JOSS (2021)
datasets
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
gladia-torchaudio
Data manipulation and transformation for audio signal processing, powered by PyTorch
tts
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
goodness-of-pronunciation-pipelines-for-oov-problem
Goodness of Pronunciation Pipelines for OOV Removal
zho-tts
Web app, command-line interface and Python library for synthesizing Chinese texts into speech.
grounded-segment-anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
https://github.com/fgnt/padertorch
A collection of common functionality to simplify the design, training and evaluation of machine learning models based on pytorch with an emphasis on speech processing.
https://github.com/fgnt/paderbox
Paderbox: A collection of utilities for audio / speech processing
huggingsound
HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools
speechllm
This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.
https://github.com/bytedance/salmonn
SALMONN family: A suite of advanced multi-modal LLMs
https://github.com/balisujohn/tortoise.cpp
A ggml (C++) re-implementation of tortoise-tts
SpecAugment
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
podcastmix
PodcastMix A dataset for separating music and speech in podcasts. Code of my Master Thesis in the Sound and Music Computing Master Program Universitat Pompeu Fabra
https://github.com/audiollms/audiobench
AudioBench: A Universal Benchmark for Audio Large Language Models
asaca-automatic-speech-analysis-for-cognitive-assessment
Transform speech into cognitive assessments with ASACA. Achieve accurate predictions and low error rates using our end-to-end toolkit. 🚀🔧
asaca-automatic-speech-analysis-for-cognitive-assessment
The automatic system that can extract PRAAT-like speech features from raw speech wav files, and also can get low WER (<10) high quality transcriptions at the same time.
en-tts
Command-line interface and Python library for synthesizing English texts into speech.
stipa
MATLAB implementation of the Speech Transmission Index for Public Address (STIPA) method for evaluating the speech transmission quality.
speech-utility-bioacoustics
On the utility of speech and audio foundation models for marmoset call analysis
cv10-uk-testset-clean
The cleaned Common Voice 10 (test set) that has been checked by a human for Ukrainian 🇺🇦
tacotron-cli
Command-line interface to train Tacotron 2 using .wav <=> .TextGrid pairs.