Updated 4 months ago

Nkululeko 1.0: A Python package to predict speaker characteristics with a high-level interface • Rank 14.2 • Science 95%

Nkululeko 1.0: A Python package to predict speaker characteristics with a high-level interface - Published in JOSS (2025)

Scientific Software
Updated 6 months ago

audiomate — Peer-reviewed • Rank 12.8 • Science 95%

audiomate: A Python package for working with audio datasets - Published in JOSS (2020)

Artificial Intelligence and Machine Learning (40%)
Scientific Software · Peer-reviewed
Scientific Software
Updated 6 months ago

pygamma-agreement — Peer-reviewed • Rank 11.7 • Science 93%

pygamma-agreement: Gamma γ measure for inter/intra-annotator agreement in Python - Published in JOSS (2021)

Mathematics
Scientific Software · Peer-reviewed
Updated 6 months ago

datasets • Rank 34.4 • Science 64%

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

Updated 6 months ago

gladia-torchaudio • Rank 30.4 • Science 64%

Data manipulation and transformation for audio signal processing, powered by PyTorch

Updated 6 months ago

zho-tts • Rank 1.9 • Science 67%

Web app, command-line interface and Python library for synthesizing Chinese texts into speech.

Updated 6 months ago

grounded-segment-anything • Rank 13.8 • Science 54%

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Updated 4 months ago

https://github.com/fgnt/padertorch • Rank 13.5 • Science 54%

A collection of common functionality to simplify the design, training and evaluation of machine learning models based on pytorch with an emphasis on speech processing.

Updated 4 months ago

https://github.com/fgnt/paderbox • Rank 15.6 • Science 44%

Paderbox: A collection of utilities for audio / speech processing

Updated 6 months ago

huggingsound • Rank 13.3 • Science 44%

HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools

Updated 6 months ago

pypar • Rank 12.4 • Science 44%

Phoneme alignment representation compatible with multiple forced aligners

Updated 6 months ago

psola • Rank 12.3 • Science 44%

Pitch-shifting and time-stretching with TD-PSOLA

Updated 6 months ago

pyfoal • Rank 10.7 • Science 44%

Python forced alignment

Updated 6 months ago

speechllm • Rank 4.6 • Science 44%

This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.

Updated 6 months ago

Phonetics • Rank 4.4 • Science 36%

A collection of functions to analyze phonetic data

Updated 6 months ago

SpecAugment • Rank 8.5 • Science 10%

A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain

Updated 6 months ago

podcastmix • Science 44%

PodcastMix A dataset for separating music and speech in podcasts. Code of my Master Thesis in the Sound and Music Computing Master Program Universitat Pompeu Fabra

Updated 6 months ago

aasp • Science 67%

Application to classify speech prosody

Updated 6 months ago

monotonic-alignment-search • Science 44%

Monotonically align text and speech

Updated 6 months ago

lhotse • Science 54%

Tools for handling multimodal data in machine learning projects.

Updated 5 months ago

https://github.com/audiollms/audiobench • Science 36%

AudioBench: A Universal Benchmark for Audio Large Language Models

Updated 6 months ago

asaca-automatic-speech-analysis-for-cognitive-assessment • Science 44%

The automatic system that can extract PRAAT-like speech features from raw speech wav files, and also can get low WER (<10) high quality transcriptions at the same time.

Updated 6 months ago

tts-tortoise-gradio • Science 44%

A Gradio setup for Tortoise TTS.

Updated 6 months ago

en-tts • Science 67%

Command-line interface and Python library for synthesizing English texts into speech.

Updated 5 months ago

https://github.com/csteinmetz1/ai-audio-startups • Science 13%

Community list of startups working with AI in audio and music technology

Updated 6 months ago

stipa • Science 44%

MATLAB implementation of the Speech Transmission Index for Public Address (STIPA) method for evaluating the speech transmission quality.

Updated 6 months ago

speech-utility-bioacoustics • Science 67%

On the utility of speech and audio foundation models for marmoset call analysis

Updated 6 months ago

cv10-uk-testset-clean • Science 44%

The cleaned Common Voice 10 (test set) that has been checked by a human for Ukrainian 🇺🇦

Updated 6 months ago

speech-recognition-uk • Science 44%

🇺🇦 Speech Recognition & Synthesis for Ukrainian

Updated 6 months ago

tacotron-cli • Science 67%

Command-line interface to train Tacotron 2 using .wav <=> .TextGrid pairs.