Soundata
Soundata: Reproducible use of audio datasets - Published in JOSS (2024)
Spafe
Spafe: Simplified python audio features extraction - Published in JOSS (2023)
audiomate
audiomate: A Python package for working with audio datasets - Published in JOSS (2020)
s(ound)lab
s(ound)lab: An easy to learn Python package for designing and running psychoacoustic experiments. - Published in JOSS (2021)
libsoni
libsoni: A Python Toolbox for Sonifying Music Annotations and Feature Representations - Published in JOSS (2024)
transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
gladia-torchaudio
Data manipulation and transformation for audio signal processing, powered by PyTorch
pyaudioanalysis
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
musicaiz
A python framework for symbolic music generation, evaluation and analysis
aac-metrics
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
https://github.com/fgnt/padertorch
A collection of common functionality to simplify the design, training and evaluation of machine learning models based on pytorch with an emphasis on speech processing.
dawdreamer
Digital Audio Workstation with Python; VST instruments/effects, parameter automation, FAUST, JAX, Warp Markers, and JUCE processors
https://github.com/fgnt/nara_wpe
Different implementations of "Weighted Prediction Error" for speech dereverberation
https://github.com/fgnt/paderbox
Paderbox: A collection of utilities for audio / speech processing
huggingsound
HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools
https://github.com/csteinmetz1/auraloss
Collection of audio-focused loss functions in PyTorch
https://github.com/bytedance/salmonn
SALMONN family: A suite of advanced multi-modal LLMs
mosqito
MoSQITo is a unified and modular development framework of key sound quality metrics favoring reproducible science and efficient shared scripting among engineers, teachers and researchers community.
https://github.com/dbraun/dac-jax
JAX Implementations of Descript Audio Codec and EnCodec
https://github.com/dbraun/abletonparsing
Parse an Ableton ASD clip file (warp markers and more) in Python
conformer
Implementation of the convolutional module from the Conformer paper, for use in Transformers
https://github.com/csteinmetz1/automix-toolkit
Models and datasets for training deep learning automatic mixing models
https://github.com/fl33tw00d/whisper-turbo
Cross-Platform, GPU Accelerated Whisper 🏎️
https://github.com/audiolabs/trackswitch.js
A Versatile Web-Based Audio Player for Presenting Scientific Results
https://github.com/dbraun/td-faust
FAUST (Functional Audio Stream) for TouchDesigner
https://github.com/bkraad47/fat_llama
fat_llama is a Python package for upscaling audio files to FLAC or WAV formats using advanced audio processing techniques. It utilizes CUDA-accelerated calculations to enhance audio quality by upsampling and adding missing frequencies through FFT, resulting in richer and more detailed audio.
https://github.com/aveek-saha/duskplayer
A minimal music player built on electron.
https://github.com/bbye98/minim
A collection of music service (iTunes, Qobuz, Spotify, TIDAL) APIs for media information retrieval and semi-automated music tagging.
https://github.com/birdnet-team/birdnet-stm32
Code for training and deployment of a tiny acoustic model for the STM32N6
nsynth-midi-renderer
Sample based concatenative synthesizer for the NSynth dataset. Render any MIDI (.mid) sequence with the notes of NSynth.
stipa
MATLAB implementation of the Speech Transmission Index for Public Address (STIPA) method for evaluating the speech transmission quality.
speechbrain
A PyTorch-based Speech Toolkit
iossystemsounds
Sounds found in and extracted from System/Library/Audio/UISounds in multiple formats.
https://github.com/carlosholivan/audiolm-google-torch
Implementation of the AudioLM model by Google in Pytorch
https://github.com/neodsp/neolab
neolab is a framework to load, analyze and manipulate audio data
birdeep_birdsongdetector_neuralnetworks
Repository for the neural networks and models created for the BIRDeep project
https://github.com/bluepixeldev/aftertone
A lightweight Unity plugin for audio pooling. This plugin aims to simplify playing throwaway abrupt sound effects such as gunshot, collision or UI sounds by utilizing a pool of prepared audio sources to reuse and play.
quantumaudio
A Python package for building Quantum Representations of Digital Audio. Developed by Moth.
https://github.com/alexanderlerch/aca-slides
Slides and Code for "An Introduction to Audio Content Analysis," also taught at Georgia Tech as MUSI-6201. This introductory course on Music Information Retrieval is based on the text book "An Introduction to Audio Content Analysis", Wiley 2012/2022
https://github.com/alexisvassquez/ai_spotibot_player
AudioMIX is an open-source, AI-driven music production software designed to empower independent artists and DJs with mood-based audio analysis, LED integration, and creative autonomy. Spotibot was its original name.
https://github.com/baggepinnen/lazywavfiles.jl
Lazily treat wav (audio) files as arrays. Arrays can be distributed over many wav files.
snac
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
bmt
Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)
https://github.com/dbraun/chuckdesigner
ChucK audio integration with TouchDesigner
eeg-cardio-audio-sleep
Project to study sound stimulus synchronous, asynchronous and isochronous with the heartbeat during sleep.
open-in-mpv
Host-side of the extension to open any link or page URL in mpv via the browser context menu.
ltfat
Official development repository of the Large Time Frequency Analysis Toolbox
laughter-detection-icsi
A ML-pipeline for training a laughter detection model on the ICSI corpus
seansaudiodb
The personal version of my audio collection database. Not intended for public use. See the other version that is intended for public use: https://github.com/seanpm2001/AudiBass_Manager
https://github.com/carlosholivan/audiogenerationdiffusion
State-of-the-art of Audio Generation with Diffusion Models
speech-utility-bioacoustics
On the utility of speech and audio foundation models for marmoset call analysis
anira
an architecture for neural network inference in real-time audio applications
2d3mf
Code and models for the paper "2D3MF: Deepfake Detection using Multi Modal Middle Fusion"
https://github.com/aaltoml/nonstationary-audio-gp
End-to-End Probabilistic Inference for Nonstationary Audio Analysis
beep
Beep is a single, 12-line, 358-character AppleScript that plays a system notification sound for the numeric value of the current hour in quick succession. (Think grandfather clock.)