panotti

A multi-channel neural network audio classifier using Keras

https://github.com/drscotthawley/panotti

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
✓
Committers with academic emails
1 of 2 committers (50.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.5%) to scientific vocabulary

Keywords

audio-classification convolutional-neural-networks keras music-tagging neural-network tensorflow

Last synced: 6 months ago · JSON representation ·

Repository

A multi-channel neural network audio classifier using Keras

Basic Info

Host: GitHub
Owner: drscotthawley
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 1.39 MB

Statistics

Stars: 270
Watchers: 14
Forks: 69
Open Issues: 17
Releases: 1

Topics

audio-classification convolutional-neural-networks keras music-tagging neural-network tensorflow

Created almost 9 years ago · Last pushed over 4 years ago

Metadata Files

Readme License Citation

Panotti: A Convolutional Neural Network classifier for multichannel audio waveforms

Panotti image (Image of large-eared Panotti people, Wikipedia)

This is a version of the audio-classifier-keras-cnn repo (which is a hack of @keunwoochoi's compact_cnn code). Difference with Panotti is, it has been generalized beyond mono audio, to include stereo or even more "channels." And it's undergone many refinements.

NOTE: The majority of issues people seem to have in using this utility, stem from inconsistencies in their audio datasets. This is to the point where I hesitate to delve into such reports. I suggest trying the binaural audio example and see if your same problems arise. -SH

Installation

UPDATE June 9, 2020: There is an updated version of Panotti that works with TensorFlow 2, currently in the panotti branch called 'tf2'. I'm not ready to merge that branch with master until Vibrary is also updated for TF2.

Preface: Requirements

Probably Mac OS X or Linux. (Windows users: I have no experience to offer you.) Not everything is required, here's a overview:

Required:
- Python 3.5
- numpy
- keras
- tensorflow
- librosa
- matplotlib
- h5py
Optional:
- sox ("Sound eXchange": command-line utility for examples/binaural. Install via "apt-get install sox")
- pygame (for exampes/headgames.py)
- For sorting-hat: flask, kivy kivy-garden

...the requirements.txt file method is going to try to install both required and optional packages.

Installation:

git clone https://github.com/drscotthawley/panotti.git

cd panotti

pip install -r requirements.txt

Demo

I'm not shipping this with any audio but you can generate some for the 'fake binaural' example (requires sox):

cd examples
./binaural_setup.sh
cd binaural
../../preprocess_data.py --dur=2 --clean
../../train_network.py

Quick Start

Make a folder called Samples/ and inside it create sub-folders with the names of each category you want to train on. Place your audio files in these sub-folders accordingly.
run python preprocess_data.py
run python train_network.py
run python eval_network.py - This applies the trained network to the testing dataset and gives you accuracy reports.

Data Preparation

Data organization:

Sound files should go into a directory called Samples/ that is local off wherever the scripts are being run. Within Samples, you should have subdirectories which divide up the various classes.

Example: for the IDMT-SMT-Audio-Effects database, using their monophonic guitar audio clips...

$ ls -F Samples/
Chorus/  Distortion/  EQ/  FeedbackDelay/  Flanger/   NoFX/  Overdrive/  Phaser/  Reverb/  SlapbackDelay/
Tremolo/  Vibrato/
$

(Within each subdirectory of Samples, there are loads of .wav or .mp3 files that correspond to each of those classes.)

"Is there any sample data that comes with this repo?" Not the data itself, but check out the examples/ directory. ;-)

Data augmentation & preprocessing:

(Optional) Augmentation:

The "augmentation" will vary the speed, pitch, dynamics, etc. of the sound files ("data") to try to "bootstrap" some extra data with which to train. If you want to augment, then you'll run it as

$ python augment_data.py <N> Samples/*/*

where N is how many augmented copies of each file you want it to create. It will place all of these in the Samples/ directory with some kind of "_augX" appended to the filename (where X just counts the number of the augmented data files). For augmentation it's assumed that all data files have the same length & sample rate.

(Required) Preprocessing:

When you preprocess, the data-loading will go much faster (e.g., 100 times faster) the next time you try to train the network. So, preprocess.

Preprocessing will pad the files with silence to fit the length to the length of the longest file and the number of channels to the file with the most channels. It will then generate mel-spectrograms of all data files, and create a "new version" of Samples/ called Preproc/.

It will do an 80-20 split of the dataset, so within Preproc/ will be the subdirectories Train/ and Test/. These will have the same subdirectory names as Samples/, but all the .wav and .mp3 files will have ".npy" on the end now. Datafiles will be randomly assigned to Train/ or Test/, and there they shall remain.

To do the preprocessing you just run

$ python preprocess_data.py

Training & Evaluating the Network

$ python train_network.py That's all you need. (I should add command-line arguments to adjust the layer size and number of layers...later.)

It will perform an 80-20 split of training vs. testing data, and give you some validation scores along the way.

It's set to run for 2000 epochs, feel free to shorten that or just ^C out at some point. It automatically does checkpointing by saving(/loading) the network weights via a new file weights.hdf5, so you can interrupt & resume the training if you need to.

After training, more diagnostics -- ROC curves, AUC -- can be obtained by running

$ python eval_network.py

(Changing the batch_size variable between training and evaluation may not be a good idea. It will probably screw up the Batch Normalization...but maybe you'll get luck.)

Results

On the IDMT Audio Effects Database using the 20,000 monophonic guitar samples across 12 effects classes, this code achieved 99.7% accuracy and an AUC of 0.9999. Specifically, 11 mistakes were made out of about 4000 testing examples; 6 of those were for the 'Phaser' effect, 3 were for EQ, a couple elsewhere, and most of the classes had zero mistakes. (No augmentation was used.)

This accuracy is comparable to the original 2010 study by Stein et al., who used a Support Vector Machine.

This was achieved by running for 10 hours on our workstation with an NVIDIA GTX1080 GPU.

Extra Tricks

We have multi-GPU training. The saving & loading means we get warning messages from Keras. Ignore those. It's because if we compile both the parallel model and its serial counterpart, it breaks things. So we leave the serial one uncompiled and that's the one we have to save. I regard this problem as a 'bug' in the Keras multi-gpu protocols.
Speaking of saving & loading, we encode the names of the output classes in the weights.hdf5 file using a HDF5 attribute 'class_names'.

-- @drscotthawley

Owner

Name: Scott H. Hawley
Login: drscotthawley
Kind: user
Location: Nashville, TN
Company: Belmont University, Physics

Website: http://hedges.belmont.edu/~shawley/
Twitter: drscotthawley
Repositories: 155
Profile: https://github.com/drscotthawley

Physics prof, musician, code tinkerer. Specialty: ML + Musical Audio

Citation (CITATION.cff)

cff-version: '1.1.0'
date-released: 06-04-2018
version: 1.0.0
message: 'Please cite the following works when using this software.'
authors:
  - family-names: 'Hawley'
    given-names: 'Scott H.'
doi: '10.5281/zenodo.1275605'
identifiers:
  - type: 'doi'
    value: '10.5281/zenodo.1275605'
  - type: 'url'
    value: 'https://github.com/drscotthawley/panotti'
title: 'Panotti: A Convolutional Neural Network Classifier for Multichannel Audio Waveforms'
url: 'https://github.com/drscotthawley/panotti'

GitHub Events

Total

Watch event: 2

Last Year

Watch event: 2

Committers

Last synced: 8 months ago

All Time

Total Commits: 264
Total Committers: 2
Avg Commits per committer: 132.0
Development Distribution Score (DDS): 0.015

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Scott Hawley	s**y@b**u	260
Kaspar Emanuel	k**l@g**m	4

Committer Domains (Top 20 + Academic)

belmont.edu: 1

Issues and Pull Requests

Last synced: 8 months ago

All Time

Total issues: 64
Total pull requests: 54
Average time to close issues: 4 months
Average time to close pull requests: 21 days
Total issue authors: 23
Total pull request authors: 6
Average comments per issue: 2.13
Average comments per pull request: 0.26
Merged pull requests: 46
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

ngragaei (8)
globalik (3)
gllort (2)
kasbah (2)
sarbjit-longia (2)
Navdevl (1)
banasu (1)
MrPaddi (1)
EarlTheCurl (1)
AlexMikhalev (1)
eupston (1)
manbharae (1)
FelixAbrahamsson (1)
Hassan770347 (1)
ErfolgreichCharismatisch (1)

Pull Request Authors

drscotthawley (22)
kasbah (2)
vepkenez (2)
Lelo123 (1)
beyondacm (1)
lucas-noyau-itdev (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads: unknown

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 1

proxy.golang.org: github.com/drscotthawley/panotti

Documentation: https://pkg.go.dev/github.com/drscotthawley/panotti#section-documentation
License: mit
Latest release: v1.0.0
published almost 8 years ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.5%

Average: 5.7%

Dependent repos count: 5.8%

Last synced: 6 months ago

Dependencies

requirements.txt pypi

Keras >=2.1.3
Pillow >=5.0.0
audioread >=2.1.4
cython ==0.27.3
h5py >=2.7.0
imageio >=2.2.0
librosa >=0.5.1
matplotlib >=2.0.0
pandas >=0.21.1
pypandoc >=1.4
scikit-image >=0.14.2
scikit_learn >=0.19.1
scipy >=1.0.0
tensorflow >=1.5.0
winshell >=0.6

setup.py pypi

h5py *
keras *
librosa *

panotti

Science Score: 77.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Panotti: A Convolutional Neural Network classifier for multichannel audio waveforms

Installation

Preface: Requirements

Installation:

Demo

Quick Start

Data Preparation

Data organization:

Data augmentation & preprocessing:

(Optional) Augmentation:

(Required) Preprocessing:

Training & Evaluating the Network

Results

Extra Tricks

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

proxy.golang.org: github.com/drscotthawley/panotti

Rankings

Dependencies