Projects | Open Source Science

Updated 8 months ago

Nkululeko 1.0: A Python package to predict speaker characteristics with a high-level interface • Rank 14.2 • Science 95%

Nkululeko 1.0: A Python package to predict speaker characteristics with a high-level interface - Published in JOSS (2025)

machine-learning pytorch speech

Scientific Software

Updated 10 months ago

audiomate — Peer-reviewed • Rank 12.8 • Science 95%

audiomate: A Python package for working with audio datasets - Published in JOSS (2020)

audio audio-datasets corpus-tools data-loader dataset-creation dataset-filtering dataset-manager music noise speech speech-recognition

Artificial Intelligence and Machine Learning (40%)

Scientific Software · Peer-reviewed

Scientific Software

Updated 10 months ago

pygamma-agreement — Peer-reviewed • Rank 11.7 • Science 93%

pygamma-agreement: Gamma γ measure for inter/intra-annotator agreement in Python - Published in JOSS (2021)

agreement annotation-tool annotations gamma-agreement natural-language-processing speech

Mathematics

Scientific Software · Peer-reviewed

Updated 10 months ago

datasets • Rank 34.4 • Science 64%

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

ai artificial-intelligence computer-vision dataset-hub datasets deep-learning llm machine-learning natural-language-processing nlp numpy pandas pytorch speech tensorflow

Updated 10 months ago

gladia-torchaudio • Rank 30.4 • Science 64%

Data manipulation and transformation for audio signal processing, powered by PyTorch

audio audio-processing io machine-learning python pytorch speech

Updated 10 months ago

tts • Rank 28.2 • Science 64%

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

deep-learning glow-tts hifigan melgan multi-speaker-tts python pytorch speaker-encoder speaker-encodings speech speech-synthesis tacotron text-to-speech tts tts-model vocoder voice-cloning voice-conversion voice-synthesis

Updated 10 months ago

goodness-of-pronunciation-pipelines-for-oov-problem • Rank 3.0 • Science 67%

Goodness of Pronunciation Pipelines for OOV Removal

asr hidden-markov-model kaldi kaldi-asr lexicon-based oov speech speech-recognition

Updated 10 months ago

silero-vad • Rank 25.2 • Science 44%

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

onnx onnx-runtime onnxruntime pytorch speech speech-processing vad voice-activity-detection voice-commands voice-control voice-detection voice-recognition

Updated 10 months ago

zho-tts • Rank 1.9 • Science 67%

Web app, command-line interface and Python library for synthesizing Chinese texts into speech.

chinese-tts linguistics speech speech-synthesis tacotron tts waveglow

Updated 10 months ago

grounded-segment-anything • Rank 13.8 • Science 54%

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

3d-whole-body-pose-estimation automatic-labeling-system caption data-generation image-editing open-vocabulary-detection open-vocabulary-segmentation speech

Updated 9 months ago

https://github.com/fgnt/padertorch • Rank 13.5 • Science 54%

A collection of common functionality to simplify the design, training and evaluation of machine learning models based on pytorch with an emphasis on speech processing.

audio pytorch speech

Updated 9 months ago

https://github.com/fgnt/paderbox • Rank 15.6 • Science 44%

Paderbox: A collection of utilities for audio / speech processing

audio speech toolbox

Updated 10 months ago

huggingsound • Rank 13.3 • Science 44%

HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools

asr audio automatic-speech-recognition speech speech-recognition speech-to-text transformers

Updated 10 months ago

pypar • Rank 12.4 • Science 44%

Phoneme alignment representation compatible with multiple forced aligners

alignment phoneme speech

Updated 10 months ago

psola • Rank 12.3 • Science 44%

Pitch-shifting and time-stretching with TD-PSOLA

pitch-shifting psola speech tdpsola time-stretching

Updated 10 months ago

pyfoal • Rank 10.7 • Science 44%

Python forced alignment

alignment phoneme speech

Updated 10 months ago

@stdlib/datasets-cmudict • Rank 7.2 • Science 44%

The Carnegie Mellon Pronouncing Dictionary (CMUdict).

data dataset datasets dictionary en english javascript language nlp node node-js nodejs pronounciation speech spelling stdlib words

Updated 10 months ago

speechllm • Rank 4.6 • Science 44%

This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.

conversational-ai llm multi-modal-llms multi-modality speech

Updated 10 months ago

speech-to-intent-dataset • Rank 4.6 • Science 44%

Dataset Release for Intent Classification from Speech

dataset intent-classification speech spoken-language-understanding task-oriented-dialog-systems voice-ai

Updated 10 months ago

https://github.com/bytedance/salmonn • Rank 9.0 • Science 36%

SALMONN family: A suite of advanced multi-modal LLMs

audio audio-processing audio-visual-understanding bytedance iclr2024 icml-2024 large-language-models multi-modal music research speech speech-recognition tsinghua-university video video-understanding

Updated 10 months ago

Phonetics • Rank 4.4 • Science 36%

A collection of functions to analyze phonetic data

acoustics language linguistics phonetics speech

Updated 10 months ago

https://github.com/balisujohn/tortoise.cpp • Rank 9.7 • Science 13%

A ggml (C++) re-implementation of tortoise-tts

ggml local speech text text-to-speech to tortoise-tts tts

Updated 10 months ago

SpecAugment • Rank 8.5 • Science 10%

A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain

data-augmentation python pytorch specaugment speech speech-recognition tensorflow

Updated 10 months ago

en-tts • Science 67%

Command-line interface and Python library for synthesizing English texts into speech.

english speech speech-synthesis tacotron text-to-speech tts waveglow

Updated 10 months ago

https://github.com/audiollms/audiobench • Science 36%

AudioBench: A Universal Benchmark for Audio Large Language Models

audio-scene-understanding speech speech-question-answering speech-recognition

Updated 10 months ago

speech-recognition-uk • Science 44%

🇺🇦 Speech Recognition & Synthesis for Ukrainian

speech speech-recognition speech-synthesis speech-to-text text-to-speech tts ukrainian

Updated 10 months ago

https://github.com/coqui-ai/tts-papers • Science 10%

🐸 collection of TTS papers

coqui-ai deep-learning papers research-paper speech tts

Updated 10 months ago

speech-utility-bioacoustics • Science 67%

On the utility of speech and audio foundation models for marmoset call analysis

audio bio-acoustics representation-learning self-supervised-learning speech

Updated 10 months ago

aasp • Science 67%

Application to classify speech prosody

classification docker prosody python speech

Updated 10 months ago

tts-tortoise-gradio • Science 44%

A Gradio setup for Tortoise TTS.

speech speech-synthesis text-to-speech tortoise-tts tts

Updated 10 months ago

monotonic-alignment-search • Science 44%

Monotonically align text and speech

python speech speech-synthesis text-to-speech tts

Updated 10 months ago

lhotse • Science 54%

Tools for handling multimodal data in machine learning projects.

ai audio data deep-learning kaldi machine-learning python pytorch speech speech-recognition

Updated 10 months ago

https://github.com/csteinmetz1/ai-audio-startups • Science 13%

Community list of startups working with AI in audio and music technology

audio list music speech startups

Updated 10 months ago

asaca-automatic-speech-analysis-for-cognitive-assessment • Science 26%

Transform speech into cognitive assessments with ASACA. Achieve accurate predictions and low error rates using our end-to-end toolkit. 🚀🔧

ai classification deep-learning feature-engineering feature-extraction machine-learning multimodal praat python python-script shap speech speech-analysis speech-and-language-processing speech-to-text training wav2vec2 wav2vec2ctc

Updated 10 months ago

podcastmix • Science 44%

PodcastMix A dataset for separating music and speech in podcasts. Code of my Master Thesis in the Sound and Music Computing Master Program Universitat Pompeu Fabra

deep-learning music podcast podcasts source-separation sourceseparation speech

Updated 10 months ago

stipa • Science 44%

MATLAB implementation of the Speech Transmission Index for Public Address (STIPA) method for evaluating the speech transmission quality.

audio evaluation intelligibility psychoacoustics speech speech-transmission-index stipa

Updated 10 months ago

tempo • Science 44%

speech tempo measurement in Praat

praat praat-scripts praatscript prosodic-analysis prosody speech tempo utrecht-university

Updated 10 months ago

asaca-automatic-speech-analysis-for-cognitive-assessment • Science 44%

The automatic system that can extract PRAAT-like speech features from raw speech wav files, and also can get low WER (<10) high quality transcriptions at the same time.

ai classification deep-learning feature-engineering feature-extraction machine-learning multimodal natural-language-processing praat python python-script shap speech speech-analysis speech-and-language-processing speech-to-text training wav2vec2 wav2vec2ctc

Updated 10 months ago

cv10-uk-testset-clean • Science 44%

The cleaned Common Voice 10 (test set) that has been checked by a human for Ukrainian 🇺🇦

asr automatic-speech-recognition speech speech-recognition speech-to-text ukrainian

Updated 10 months ago

tacotron-cli • Science 67%

Command-line interface to train Tacotron 2 using .wav <=> .TextGrid pairs.

linguistics speech speech-synthesis tts