Soundata
Soundata: Reproducible use of audio datasets - Published in JOSS (2024)
Spafe
Spafe: Simplified python audio features extraction - Published in JOSS (2023)
audiomate
audiomate: A Python package for working with audio datasets - Published in JOSS (2020)
s(ound)lab
s(ound)lab: An easy to learn Python package for designing and running psychoacoustic experiments. - Published in JOSS (2021)
libsoni
libsoni: A Python Toolbox for Sonifying Music Annotations and Feature Representations - Published in JOSS (2024)
transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
gladia-torchaudio
Data manipulation and transformation for audio signal processing, powered by PyTorch
pyaudioanalysis
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
musicaiz
A python framework for symbolic music generation, evaluation and analysis
aac-metrics
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
https://github.com/fgnt/padertorch
A collection of common functionality to simplify the design, training and evaluation of machine learning models based on pytorch with an emphasis on speech processing.
dawdreamer
Digital Audio Workstation with Python; VST instruments/effects, parameter automation, FAUST, JAX, Warp Markers, and JUCE processors
https://github.com/fgnt/nara_wpe
Different implementations of "Weighted Prediction Error" for speech dereverberation
https://github.com/fgnt/paderbox
Paderbox: A collection of utilities for audio / speech processing
huggingsound
HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools
https://github.com/csteinmetz1/auraloss
Collection of audio-focused loss functions in PyTorch
https://github.com/bytedance/salmonn
SALMONN family: A suite of advanced multi-modal LLMs
mosqito
MoSQITo is a unified and modular development framework of key sound quality metrics favoring reproducible science and efficient shared scripting among engineers, teachers and researchers community.
https://github.com/dbraun/dac-jax
JAX Implementations of Descript Audio Codec and EnCodec
https://github.com/dbraun/abletonparsing
Parse an Ableton ASD clip file (warp markers and more) in Python
conformer
Implementation of the convolutional module from the Conformer paper, for use in Transformers
https://github.com/csteinmetz1/automix-toolkit
Models and datasets for training deep learning automatic mixing models
https://github.com/fl33tw00d/whisper-turbo
Cross-Platform, GPU Accelerated Whisper 🏎️
https://github.com/audiolabs/trackswitch.js
A Versatile Web-Based Audio Player for Presenting Scientific Results
https://github.com/dbraun/td-faust
FAUST (Functional Audio Stream) for TouchDesigner
bmt
Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)
https://github.com/dbraun/chuckdesigner
ChucK audio integration with TouchDesigner
stipa
MATLAB implementation of the Speech Transmission Index for Public Address (STIPA) method for evaluating the speech transmission quality.
open-in-mpv
Host-side of the extension to open any link or page URL in mpv via the browser context menu.
https://github.com/birdnet-team/birdnet-stm32
Code for training and deployment of a tiny acoustic model for the STM32N6
2d3mf
Code and models for the paper "2D3MF: Deepfake Detection using Multi Modal Middle Fusion"
https://github.com/carlosholivan/audiogenerationdiffusion
State-of-the-art of Audio Generation with Diffusion Models
https://github.com/baggepinnen/lazywavfiles.jl
Lazily treat wav (audio) files as arrays. Arrays can be distributed over many wav files.
beep
Beep is a single, 12-line, 358-character AppleScript that plays a system notification sound for the numeric value of the current hour in quick succession. (Think grandfather clock.)
https://github.com/aaltoml/nonstationary-audio-gp
End-to-End Probabilistic Inference for Nonstationary Audio Analysis
birdeep_birdsongdetector_neuralnetworks
Repository for the neural networks and models created for the BIRDeep project
quantumaudio
A Python package for building Quantum Representations of Digital Audio. Developed by Moth.
https://github.com/alexanderlerch/aca-slides
Slides and Code for "An Introduction to Audio Content Analysis," also taught at Georgia Tech as MUSI-6201. This introductory course on Music Information Retrieval is based on the text book "An Introduction to Audio Content Analysis", Wiley 2012/2022
https://github.com/alexisvassquez/ai_spotibot_player
AudioMIX is an open-source, AI-driven music production software designed to empower independent artists and DJs with mood-based audio analysis, LED integration, and creative autonomy. Spotibot was its original name.
https://github.com/aveek-saha/duskplayer
A minimal music player built on electron.
https://github.com/bbye98/minim
A collection of music service (iTunes, Qobuz, Spotify, TIDAL) APIs for media information retrieval and semi-automated music tagging.
laughter-detection-icsi
A ML-pipeline for training a laughter detection model on the ICSI corpus
https://github.com/bkraad47/fat_llama
fat_llama is a Python package for upscaling audio files to FLAC or WAV formats using advanced audio processing techniques. It utilizes CUDA-accelerated calculations to enhance audio quality by upsampling and adding missing frequencies through FFT, resulting in richer and more detailed audio.
https://github.com/carlosholivan/audiolm-google-torch
Implementation of the AudioLM model by Google in Pytorch
anira
an architecture for neural network inference in real-time audio applications
nsynth-midi-renderer
Sample based concatenative synthesizer for the NSynth dataset. Render any MIDI (.mid) sequence with the notes of NSynth.
eeg-cardio-audio-sleep
Project to study sound stimulus synchronous, asynchronous and isochronous with the heartbeat during sleep.
snac
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
seansaudiodb
The personal version of my audio collection database. Not intended for public use. See the other version that is intended for public use: https://github.com/seanpm2001/AudiBass_Manager
speech-utility-bioacoustics
On the utility of speech and audio foundation models for marmoset call analysis
https://github.com/bluepixeldev/aftertone
A lightweight Unity plugin for audio pooling. This plugin aims to simplify playing throwaway abrupt sound effects such as gunshot, collision or UI sounds by utilizing a pool of prepared audio sources to reuse and play.
speechbrain
A PyTorch-based Speech Toolkit
ltfat
Official development repository of the Large Time Frequency Analysis Toolbox
iossystemsounds
Sounds found in and extracted from System/Library/Audio/UISounds in multiple formats.
https://github.com/neodsp/neolab
neolab is a framework to load, analyze and manipulate audio data