Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: NickleDave
  • License: bsd-3-clause
  • Language: Jupyter Notebook
  • Default Branch: master
  • Size: 6.2 MB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created about 8 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License Citation

README.md

Comparison of Neural Networks for Segmentation of Vocalizations

Poster presented at Southern Data Science Conference 2018.

poster image

seg-nets package

This repository contains scripts to reproduce results, as well as the seg_nets package. The package contains the network and various utility functions used by the scripts.

installation

It's probably easiest to use Anaconda. First set up a conda environment and clone the repo $ conda create -n seg-nets numpy scipy joblib tensorflow-gpu ipython jupyter $ git clone https://github.com/NickleDave/tf_syllable_segmentation_annotation davids_fork_of_tf_sylseg $ source activate seg-nets

usage

There are 3 main scripts that are run consecutively. The scripts accept a config.ini file; you will use the same config.ini file with each script but you will make changes to it after running each script.

1. Make data sets

You will make data sets for training, validation, and testing with the make_data.py script.
Before you run the script you need to create a config.ini file. You can adapt the template_config.ini file that's in this repository. In the config file, set values for the following options in the '[DATA]section: ``ini [DATA] labelset = iabcdefghjk # set of labels, str, int, or a data_dir = /home/user/data/subdir/ # directory with audio files

durations of training, validation, and test sets in seconds

totaltrainsetduration = 400 trainsetdurs = 5, 15, 30, 45, 60, 75, 90, 105 validationsetduration = 100 testsetduration = 400 skipfileswithlabelsnotin_labelset = Yes ``` For more about what each of these options mean, see README_config.md.

After writing the config file, run make_data.py at the command line with the config file specified: (seg-nets) $ python ./seg-nets/make_data.py config_03218_bird0.ini

2. Generate learning curves

After making the data sets, you generate the data for learning curves, using the learn_curve.py script. A learning curve is a plot where the x-axis is size of the training set (in this case, duration in seconds) and the y axis is error, accuracy, or some similar metric. The script grabs random subsets of training data of a fixed size (specified by the train_set_durs option in the config file) and uses the subsets to train the network. This model is then saved and its ability to generalize is estimated by measuring error on a test set, using the summary.py script (below).

Before running the learn_curve.py script you again need to modify some options in the config.ini file. ini [TRAIN] train_data_path = /home/user/data/subdir/subsubdir1/spects/train_data_dict val_data_path = /home/user/data/subdir/subsubdir1/spects/val_data_dict test_data_path = /home/user/data/subdir/subsubdir1/spects/test_data_dict use_train_subsets_from_previous_run = No previous_run_path = /home/user/data/subdir/results_ normalize_spectrograms = Yes n_max_iter = 18000 val_error_step = 150 checkpoint_step = 600 save_only_single_checkpoint_file = True patience = None replicates = 5

Most importantly, you should change train_dict_path to wherever 'traindatadict' got saved; the path should include the filename. Do the same for val_data_path and test_data_path.

You'll also want to change the first results_dir option under the [OUTPUT] section to wherever you want to save all the output (checkpoint files, copies of training data, etc.).

After modifying the config file, run learn_curve.py at the command line with the config file specified: (seg-nets) $ CUDA_VISIBLE_DEVICES=0 python ./seg-nets/make_data.py config_03218_bird0.ini

(Note it is not required to specify which GPU to use with CUDA_VISIBLE_DEVICES.)

3.Generate summary of results

After this script finishes, you must change

(seg-nets) $ CUDA_VISIBLE_DEVICES=0 python ./seg-nets/summary.py config_03218_bird0.ini

Using your own spectrograms

use the matutils functions on the .mat form of the data, to make a datadict that the main.py function can use. One function, converttrainkeystotxt, makes a .txt file that contains the .mat filenames in trainkeys.mat. The other function, makedatafrommatlabspects, uses that trainingfilenames.txt file to create a Python dictionary containing the spectrograms and labeled timebin vectors, and some associated metadata. This dictionary has the same format as the dictionary the main.py function uses when cnnbilstm.utils generates the data directly from the .cbin files. ``` /Users/david/compare-seg-models $ activate learncurve (learncurve) /Users/david/compare-seg-models $ ipython [0] import cnnbilstm [1] cd directorywithmatandtrainkeys [2] cnnbilstm.matutils.converttrainkeystotxt('.', 'trainingfilenames) [3] cnnbilstm.mattuils.makedatafrommatlabspects('.', 'trainingfilenames', 'traindata_dict') ```

Owner

  • Name: David Nicholson
  • Login: NickleDave
  • Kind: user
  • Location: Charm City
  • Company: @vocalpy

ML, AI; behavior + cog + neuro. Open + inclusive science. Pythonista. He/him (they's ok too). Habla espanglish y baila salsa y bachata a medio tiempo.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Comparison of Neural Networks for Segmentation of
  Vocalizations
message: >-
  code associated with presentation at 2018 Southern
  Data Science conference
type: software
authors:
  - given-names: David
    family-names: Nicholson
    email: nicholdav@gmail.com
    orcid: 'https://orcid.org/0000-0002-4261-4719'
    affiliation: Emory University
license: BSD-3-Clause

GitHub Events

Total
Last Year

Committers

Last synced: 12 months ago

All Time
  • Total Commits: 65
  • Total Committers: 2
  • Avg Commits per committer: 32.5
  • Development Distribution Score (DDS): 0.123
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
David Nicholson n****v@g****m 57
NickleDave n****e 8

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

setup.py pypi