Projects | Open Source Science

Scientific Software

Updated 11 months ago

Software Design and User Interface of ESPnet-SE++ — Peer-reviewed • Rank 25.3 • Science 95%

Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing - Published in JOSS (2023)

chainer deep-learning end-to-end kaldi machine-translation pytorch singing-voice-synthesis speaker-diarization speech-enhancement speech-recognition speech-separation speech-synthesis speech-translation spoken-language-understanding text-to-speech voice-conversion

Earth and Environmental Sciences (40%) Biology (40%)

Scientific Software · Peer-reviewed

Scientific Software

Updated 11 months ago

SpeechPy - A Library for Speech Processing and Recognition — Peer-reviewed • Rank 16.6 • Science 93%

SpeechPy - A Library for Speech Processing and Recognition - Published in JOSS (2018)

feature-extraction python speech-recognition speechpy

Mathematics

Scientific Software · Peer-reviewed

Scientific Software

Updated 11 months ago

audiomate — Peer-reviewed • Rank 12.8 • Science 95%

audiomate: A Python package for working with audio datasets - Published in JOSS (2020)

audio audio-datasets corpus-tools data-loader dataset-creation dataset-filtering dataset-manager music noise speech speech-recognition

Artificial Intelligence and Machine Learning (40%)

Scientific Software · Peer-reviewed

Updated 11 months ago

transformers • Rank 38.7 • Science 64%

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

audio deep-learning deepseek gemma glm hacktoberfest llm machine-learning model-hub natural-language-processing nlp pretrained-models python pytorch pytorch-transformers qwen speech-recognition transformer vlm

Updated 11 months ago

goodness-of-pronunciation-pipelines-for-oov-problem • Rank 3.0 • Science 67%

Goodness of Pronunciation Pipelines for OOV Removal

asr hidden-markov-model kaldi kaldi-asr lexicon-based oov speech speech-recognition

Updated 11 months ago

huggingsound • Rank 13.3 • Science 44%

HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools

asr audio automatic-speech-recognition speech speech-recognition speech-to-text transformers

Updated 11 months ago

aniemore • Rank 11.4 • Science 44%

Emotions recognition from audio and text files (only russian language)

artificial-intelligence deep-learning emotion-recognition machine-learning package python russian-language speech-recognition speech-to-text text-classification voice-classfication

Updated 10 months ago

stt • Rank 21.4 • Science 33%

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

asr automatic-speech-recognition deep-learning speech-recognition speech-recognition-api speech-recognizer speech-to-text stt tensorflow voice-recognition

Updated 11 months ago

whisper_ros • Rank 6.0 • Science 44%

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

asr automatic-speech-recognition ggml ros2 speech-recognition speech-to-text vad voice-activity-detection whisper whisper-cpp

Updated 10 months ago

https://github.com/coqui-ai/stt-model-manager • Rank 12.1 • Science 36%

Coqui STT Model Manager - install, manage and try out Coqui STT models from the Model Zoo

coqui-ai flask python react speech-recognition stt websocket

Updated 10 months ago

https://github.com/bytedance/salmonn • Rank 9.0 • Science 36%

SALMONN family: A suite of advanced multi-modal LLMs

audio audio-processing audio-visual-understanding bytedance iclr2024 icml-2024 large-language-models multi-modal music research speech speech-recognition tsinghua-university video video-understanding

Updated 10 months ago

https://github.com/bagustris/speech-recognition-course • Rank 1.4 • Science 39%

Material for learning speech recognition, based on Microsoft teaching material on EdX

speech-processing speech-recognition speech-to-text

Updated 10 months ago

https://github.com/amanvirparhar/chaplin • Rank 6.3 • Science 26%

A real-time silent speech recognition tool.

auto-avsr avsr llm ollama speech-recognition speech-to-text vsr

Updated 10 months ago

https://github.com/astorfi/lip-reading-deeplearning • Rank 8.2 • Science 23%

:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures

3d-convolutional-network computer-vision deep-learning speech-recognition tensorflow

Updated 10 months ago

https://github.com/ccoreilly/vosk-browser • Rank 17.8 • Science 13%

A speech recognition library running in the browser thanks to a WebAssembly build of Vosk

asr kaldi speech-recognition speech-to-text stt typescript vosk wasm webassembly

Updated 10 months ago

https://github.com/bagustris/id • Rank 1.9 • Science 26%

Iban-based Kaldi recipe for Indonesian speech Corpus, presented at ASJ Spring 2019.

bahasa-indonesia kaldi-asr speech-recognition

Updated 10 months ago

https://github.com/fl33tw00d/whisper-turbo • Rank 13.3 • Science 13%

Cross-Platform, GPU Accelerated Whisper 🏎️

audio machine-learning rust speech-recognition webgpu whisper windows

Updated 10 months ago

SpecAugment • Rank 8.5 • Science 10%

A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain

data-augmentation python pytorch specaugment speech speech-recognition tensorflow

Updated 9 months ago

https://github.com/dcavar/elan2split • Rank 3.0 • Science 13%

Split ELAN Annotation Files and corresponding speech files into a corpus format for common ASR and Forced Aligners

cpp11 elan forced-alignment sox speech-corpus speech-recognition xerxes xml

Updated 10 months ago

https://github.com/ccoreilly/catalan-speech-recognition-benchmark • Rank 2.1 • Science 13%

A benchmark of speech recognition solutions for the Catalan language

asr asr-model catala catalan catalan-language deepspeech speech-recognition speech-to-text vosk

Updated 11 months ago

speech-recognition • Science 36%

Develop speech recognition models with Tensorflow 2

deepspeech listen-attend-and-spell speech-recognition tensorflow tensorflow2

Updated 11 months ago

cv10-uk-testset-clean • Science 44%

The cleaned Common Voice 10 (test set) that has been checked by a human for Ukrainian 🇺🇦

asr automatic-speech-recognition speech speech-recognition speech-to-text ukrainian

Updated 10 months ago

https://github.com/alexeyev/ysk-minimal-tgbot • Science 13%

Minimal Telegram + Yandex Speech Kit speech-to-text bot.

speech-recognition speech-to-text telegram-bot yandex-speech-kit

Updated 10 months ago

https://github.com/bagustris/book-ser • Science 26%

Codes for the book: Speech Emotion Recognition: Theory and Practice

speech-recognition

Updated 11 months ago

speechbrain • Science 64%

A PyTorch-based Speech Toolkit

asr audio audio-processing deep-learning huggingface language-model pytorch speaker-diarization speaker-recognition speaker-verification speech-enhancement speech-processing speech-recognition speech-separation speech-to-text speech-toolkit speechrecognition spoken-language-understanding transformers voice-recognition

Updated 11 months ago

asr-wav2vec-finetune • Science 54%

⚡ Finetune Wa2vec 2.0 For Speech Recognition

asr finetuning huggingface-transformers pytorch speech-recognition speech-to-text vietnames-speech-recoging vietnamese wav2vec2

Updated 11 months ago

allophant • Science 44%

A multilingual phoneme recognizer capable of generalizing zero-shot to unseen phoneme inventories.

cross-lingual machine-learning multilingual neural-networks phoneme-recognition speech-recognition zero-shot

Updated 11 months ago

asr-wav2vec-finetune • Science 67%

:zap: Finetune Wa2vec 2.0 For Speech Recognition

asr finetune-wav2vec huggingface pytorch speech-recognition speech-to-text vietnamese-speech-recognition wav2vec2

Updated 10 months ago

https://github.com/awslabs/speech-representations • Science 10%

Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)

deep-learning nlp speech-recognition

Updated 11 months ago

lhotse • Science 54%

Tools for handling multimodal data in machine learning projects.

ai audio data deep-learning kaldi machine-learning python pytorch speech speech-recognition

Updated 11 months ago

speech-recognition-uk • Science 44%

🇺🇦 Speech Recognition & Synthesis for Ukrainian

speech speech-recognition speech-synthesis speech-to-text text-to-speech tts ukrainian

Updated 10 months ago

https://github.com/breandan/hello-robot • Science 13%

A speech interface for controlling our robot.

robotics speech-recognition

Updated 11 months ago

balena • Science 44%

BALanced Execution through Natural Activation : a human-computer interaction methodology for code running.

execution python3 sentence-similarity sentence-transformers speech-recognition speech-to-function speech-to-text terminal transformers wav2vec2

Updated 11 months ago

multilingual-asr • Science 54%

Multilingual Speech Recognition for Indonesian Languages

asr indonesian machine-learning nlp speech-recognition

Updated 11 months ago

pinyin-to-ipa • Science 67%

Command-line interface and Python library to transcribe pinyin to IPA. The tones are attached to the vowel of the syllable.

bopomofo chinese cyrillic international-phonetic-alphabet linguistics phonetics pinyin speech-recognition speech-synthesis transcription tts zhuyin

Updated 10 months ago

https://github.com/audiollms/audiobench • Science 36%

AudioBench: A Universal Benchmark for Audio Large Language Models

audio-scene-understanding speech speech-question-answering speech-recognition

Updated 10 months ago

https://github.com/coqui-ai/open-speech-corpora • Science 23%

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

speech-emotion-recognition speech-processing speech-recognition speech-separation speech-synthesis speech-to-text stt text-to-speech tts voice-activity-detection voice-cloning voice-recognition

Updated 10 months ago

https://github.com/awslabs/mlm-scoring • Science 10%

Python library & examples for Masked Language Model Scoring (ACL 2020)

bert language-model mxnet nlp pytorch speech-recognition xlm