compare-seg-models
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.4%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: NickleDave
- License: bsd-3-clause
- Language: Jupyter Notebook
- Default Branch: master
- Size: 6.2 MB
Statistics
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Comparison of Neural Networks for Segmentation of Vocalizations
Poster presented at Southern Data Science Conference 2018.

seg-nets package
This repository contains scripts to reproduce results, as well as the seg_nets package.
The package contains the network and various utility functions used by the scripts.
installation
It's probably easiest to use Anaconda. First set up a conda environment and clone the repo
$ conda create -n seg-nets numpy scipy joblib tensorflow-gpu ipython jupyter
$ git clone https://github.com/NickleDave/tf_syllable_segmentation_annotation davids_fork_of_tf_sylseg
$ source activate seg-nets
usage
There are 3 main scripts that are run consecutively. The scripts accept a config.ini file; you will use the same config.ini file with each script but you will make changes to it after running each script.
1. Make data sets
You will make data sets for training, validation, and testing with the make_data.py script.
Before you run the script you need to create a config.ini file. You can adapt the
template_config.ini file that's in this repository.
In the config file, set values for the following options in the '[DATA]section:
``ini
[DATA]
labelset = iabcdefghjk # set of labels, str, int, or a
data_dir = /home/user/data/subdir/ # directory with audio files
durations of training, validation, and test sets in seconds
totaltrainsetduration = 400 trainsetdurs = 5, 15, 30, 45, 60, 75, 90, 105 validationsetduration = 100 testsetduration = 400 skipfileswithlabelsnotin_labelset = Yes ``` For more about what each of these options mean, see README_config.md.
After writing the config file, run make_data.py at the command line with the config file specified:
(seg-nets) $ python ./seg-nets/make_data.py config_03218_bird0.ini
2. Generate learning curves
After making the data sets, you generate the data for learning curves,
using the learn_curve.py script.
A learning curve is a plot where the x-axis is size of the training set
(in this case, duration in seconds) and the y axis is error, accuracy, or some similar metric.
The script grabs random subsets of training data of a fixed size (specified by the
train_set_durs option in the config file) and uses the subsets to train the network.
This model is then saved and its ability to generalize is estimated by measuring error on
a test set, using the summary.py script (below).
Before running the learn_curve.py script you again need to modify some
options in the config.ini file.
ini
[TRAIN]
train_data_path = /home/user/data/subdir/subsubdir1/spects/train_data_dict
val_data_path = /home/user/data/subdir/subsubdir1/spects/val_data_dict
test_data_path = /home/user/data/subdir/subsubdir1/spects/test_data_dict
use_train_subsets_from_previous_run = No
previous_run_path = /home/user/data/subdir/results_
normalize_spectrograms = Yes
n_max_iter = 18000
val_error_step = 150
checkpoint_step = 600
save_only_single_checkpoint_file = True
patience = None
replicates = 5
Most importantly, you should change train_dict_path to wherever 'traindatadict' got saved;
the path should include the filename. Do the same for val_data_path and test_data_path.
You'll also want to change the first results_dir option under the [OUTPUT] section to
wherever you want to save all the output (checkpoint files, copies of training data, etc.).
After modifying the config file, run learn_curve.py at the command line with the config file specified:
(seg-nets) $ CUDA_VISIBLE_DEVICES=0 python ./seg-nets/make_data.py config_03218_bird0.ini
(Note it is not required to specify which GPU to use with CUDA_VISIBLE_DEVICES.)
3.Generate summary of results
After this script finishes, you must change
(seg-nets) $ CUDA_VISIBLE_DEVICES=0 python ./seg-nets/summary.py config_03218_bird0.ini
Using your own spectrograms
use the matutils functions on the .mat form of the data, to make a datadict that the main.py function can use. One function, converttrainkeystotxt, makes a .txt file that contains the .mat filenames in trainkeys.mat. The other function, makedatafrommatlabspects, uses that trainingfilenames.txt file to create a Python dictionary containing the spectrograms and labeled timebin vectors, and some associated metadata. This dictionary has the same format as the dictionary the main.py function uses when cnnbilstm.utils generates the data directly from the .cbin files. ``` /Users/david/compare-seg-models $ activate learncurve (learncurve) /Users/david/compare-seg-models $ ipython [0] import cnnbilstm [1] cd directorywithmatandtrainkeys [2] cnnbilstm.matutils.converttrainkeystotxt('.', 'trainingfilenames) [3] cnnbilstm.mattuils.makedatafrommatlabspects('.', 'trainingfilenames', 'traindata_dict') ```
Owner
- Name: David Nicholson
- Login: NickleDave
- Kind: user
- Location: Charm City
- Company: @vocalpy
- Website: https://nicholdav.info/
- Repositories: 15
- Profile: https://github.com/NickleDave
ML, AI; behavior + cog + neuro. Open + inclusive science. Pythonista. He/him (they's ok too). Habla espanglish y baila salsa y bachata a medio tiempo.
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
Comparison of Neural Networks for Segmentation of
Vocalizations
message: >-
code associated with presentation at 2018 Southern
Data Science conference
type: software
authors:
- given-names: David
family-names: Nicholson
email: nicholdav@gmail.com
orcid: 'https://orcid.org/0000-0002-4261-4719'
affiliation: Emory University
license: BSD-3-Clause
GitHub Events
Total
Last Year
Committers
Last synced: 12 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| David Nicholson | n****v@g****m | 57 |
| NickleDave | n****e | 8 |
Issues and Pull Requests
Last synced: 12 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0