Software Design and User Interface of ESPnet-SE++
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing - Published in JOSS (2023)
SpeechPy - A Library for Speech Processing and Recognition
SpeechPy - A Library for Speech Processing and Recognition - Published in JOSS (2018)
audiomate
audiomate: A Python package for working with audio datasets - Published in JOSS (2020)
transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
goodness-of-pronunciation-pipelines-for-oov-problem
Goodness of Pronunciation Pipelines for OOV Removal
huggingsound
HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools
aniemore
Emotions recognition from audio and text files (only russian language)
stt
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
whisper_ros
Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2
https://github.com/coqui-ai/stt-model-manager
Coqui STT Model Manager - install, manage and try out Coqui STT models from the Model Zoo
https://github.com/bytedance/salmonn
SALMONN family: A suite of advanced multi-modal LLMs
https://github.com/bagustris/speech-recognition-course
Material for learning speech recognition, based on Microsoft teaching material on EdX
https://github.com/amanvirparhar/chaplin
A real-time silent speech recognition tool.
https://github.com/astorfi/lip-reading-deeplearning
:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
https://github.com/ccoreilly/vosk-browser
A speech recognition library running in the browser thanks to a WebAssembly build of Vosk
https://github.com/bagustris/id
Iban-based Kaldi recipe for Indonesian speech Corpus, presented at ASJ Spring 2019.
https://github.com/fl33tw00d/whisper-turbo
Cross-Platform, GPU Accelerated Whisper 🏎️
SpecAugment
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
https://github.com/dcavar/elan2split
Split ELAN Annotation Files and corresponding speech files into a corpus format for common ASR and Forced Aligners
https://github.com/ccoreilly/catalan-speech-recognition-benchmark
A benchmark of speech recognition solutions for the Catalan language
speechbrain
A PyTorch-based Speech Toolkit
pinyin-to-ipa
Command-line interface and Python library to transcribe pinyin to IPA. The tones are attached to the vowel of the syllable.
https://github.com/coqui-ai/open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
https://github.com/bagustris/book-ser
Codes for the book: Speech Emotion Recognition: Theory and Practice
balena
BALanced Execution through Natural Activation : a human-computer interaction methodology for code running.
https://github.com/awslabs/speech-representations
Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)
https://github.com/audiollms/audiobench
AudioBench: A Universal Benchmark for Audio Large Language Models
https://github.com/awslabs/mlm-scoring
Python library & examples for Masked Language Model Scoring (ACL 2020)
cv10-uk-testset-clean
The cleaned Common Voice 10 (test set) that has been checked by a human for Ukrainian 🇺🇦
allophant
A multilingual phoneme recognizer capable of generalizing zero-shot to unseen phoneme inventories.
https://github.com/breandan/hello-robot
A speech interface for controlling our robot.
https://github.com/alexeyev/ysk-minimal-tgbot
Minimal Telegram + Yandex Speech Kit speech-to-text bot.