insect-detection-remote-sensing-mdpi
https://github.com/bmw-lab-msu/insect-detection-remote-sensing-mdpi
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 8 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: BMW-lab-MSU
- License: bsd-3-clause
- Language: MATLAB
- Default Branch: main
- Size: 97.7 MB
Statistics
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 3
Metadata Files
README.md
insect-detection-remote-sensing-mdpi
This software is associated with the publication titled "Comparison of Supervised Learning and Changepoint Detection for Insect Detection in Lidar Data" by authors T. C. Vannoy, N. B. Sweeney, J. A. Shaw, and B. M. Whitaker." by authors T. C. Vannoy, N. B. Sweeney, J. A. Shaw, and B. M. Whitaker.
The associated dataset can be found at https://zenodo.org/doi/10.5281/zenodo.10055762.
Data setup
You can specify the locations of the data and results folders in the beehiveDataSetup.m script. By default, the scripts uses the following relative path setup, where code is a folder containing this repository:
├── code
├── data
│ ├── combined
│ ├── preprocessed
│ ├── raw
│ │ ├── 2022-06-23
│ │ ├── 2022-06-24
│ │ ├── 2022-07-28
│ │ ├── 2022-07-29
│ ├── testing
│ ├── training
│ └── validation
└── results
├── changepoint-results
│ └── runtimes
├── testing
└── training
├── classifiers
├── data-sampling
├── default-params
└── hyperparameter-tuning
In the above setup, the dataset goes into the raw folder. The dataset can be downloaded from the Zenodo archive.
Running the code
[!IMPORTANT] In general, you need to call
pathSetup.mfirst before running anything, as that script adds all the folders in this repo to your MATLAB path. Additionally, if you use the default relative data path setup described above, you must run all of your code from the root of this repository, not the subfolders; if you specify full paths inbeehiveDataSetup.m, then you can run the code from anywhere.[!TIP] This code is designed to run on a computing cluster. If you have access to a computing cluster that uses slurm, you can update and use the scripts in the
slurmfolder. The code will still run perfectly fine on a normal desktop computer—it will just take longer.
The folder icons (📁) after the headings link to the relevant folder in the repository.
There are three main portions of code, listed below. You have to run the data preparation first, but the supervised learning and changepoint detection sections are independent. Use the following links to jump to the sections.
Data preparation 📁
- (optional) Convert the csv label files into .mat files using
convertAllLabels.m; this has already been done in the archived dataset. - Combine the individual raw data files into larger groups that can then be split into training and testing sets; this is done with
combineDataForTrainingTesting.m - Preprocess the data using
preprocess.m - Split the preprocessed data into the training, validation, and testing sets.
spiltData.m
Using slurm
If you have access to a slurm cluster, these steps can be done by running [`prepare-data.sh`](slurm/perpare-data.sh).Supervised learning 📁
Feature extraction 📁
Before training any of the feature-based algorithms, we need to precompute the features:
- training data:
precomputeTrainingFeatures.m - validation data:
precomputeValidationFeatures.m - testing data:
precomputeTestingFeatures.m
Using slurm
If you have access to a slurm cluster, these steps can be done by running [`precompute-features.sh`](slurm/precompute-features.sh).Training 📁
Using slurm
If you have access to a slurm cluster, all the training and testing can be launched using [`run-row-methods.sh`](slurm/run-row.methods.sh) and [`run-image-methods.sh`](slurm/run-image-methods.sh).Data sampling parameter tuning (row methods only)
For the row-based methods, we first need to create the grid search parameters using createDataSamplingGrid.m. Once that is done, we can perform the grid search.
For the feature-based methods, call the evalSamplingGridRowFeatureMethod function with the grid search index, e.g.
matlab
evalSamplingGridRowFeatureMethod(@AdaBoost,1,UseParallel=true,UseGPU=true)
For the deep learning methods, call the evalSamplingGridRowDataMethod function with the grid search index, e.g.
matlab
evalSamplingGridRowDataMethod(@CNN1d,1,UseParallel=true,UseGPU=true)
This grid search was designed to run in parallel on a computing cluster, specifically using slurm job arrays. See the samplingGridSearch*.slurm scripts for full details of the function calls for each of the classifiers. In particular, for the neural networks that had more than one hidden layer, we have to pass in the parameters (e.g., layer sizes) into the evalSamplingGrid* functions.
If you don't have access to a computing cluster, you run the grid search methods in a for loop:
matlab
for gridIdx = 1:16
evalSamplingGridRowFeatureMethod(@AdaBoost,1,UseParallel=true,UseGPU=true)
end
You could also use a parallel for loop, which may or may not be faster than running each iteration with parallel feature extraction and training (UseParallel=true):
matlab
parfor gridIdx = 1:16
evalSamplingGridRowFeatureMethod(@AdaBoost,1,UseParallel=false,UseGPU=true)
end
Once the grid search for an algorithm is done, run the selectBestSamplingParams function to save the sampling parameters that resulted in the best MCC value, e.g.:
matlab
selectBestSamplingParams("AdaBoost")
Train default 2D CNNs (image methods only) 📁
For the image-based methods (2D CNNs), we need to train networks with the default hyperparameters. This is because the default hyperparameters might perform better than the parameters found during tuning, and thus we would prefer to use the default parameters for the final training.
The default 2D CNNS can be trained with the trainCNN2dManualParams function. See the trainDefaultCNN2d*.slurm scripts to see the relevant function call for each 2D CNN.
For example, here's the code used to train the default 3-layer 2D CNN:
matlab
p.FilterSize=[16,2;,16,2;16,2];
p.Nfilters=[20,20,20];
trainCNN2dManualParams(@CNN2d,UseGPU=true,ClassifierParams=p)
Model hyperparameter tuning 📁
Create hyperparameter search values
First, we need to create mat files that contain the model's hyperparameter search values. Each model has a separate function to create it's associated hyperparameter search values, except AdaBoost and RUSBoost, which use the same search values.
Examples: ```matlab
For AdaBoost and RUSBoost
createBoostTreesHyperparamSearchRange
matlab
createCNN2d1LayerHyperparamSearchRange
```
See hyperparameter-tuning for the functions that create the hyperparameter search values.
Tune hyperparameters
There are three different hyperparameter tuning functions, one for each of the algorithm types:
- feature engineering methods: tuneHyperparamsRowFeatureMethod
- 1D CNNs: tuneHyperparamsRowDataMethod
- 2D CNNs: tuneHyperparamsImageMethod
Examples:
matlab
tuneHyperparamsRowFeatureMethod("StatsNeuralNetwork1Layer",@StatsNeuralNetwork,UseParallel=true);
tuneHyperparamsRowDataMethod("CNN1d1Layer",@CNN1d,UseGPU=true,UseParallel=true);
tuneHyperparamsImageMethod("CNN2d3Layer",@CNN2d,UseGPU=true);
See the tuneHyperparams*.slurm scripts in the slurm folder to see the function calls for each method.
Final training 📁
After hyperparameter tuning is done, the algorithms need to be trained one final time on the entire training/validation set.
Similar to the hyperparameter tuning, there are three different training functions, one for each of the algorithm types:
- feature engineering methods: trainRowFeatureMethod
- 1D CNNs: trainRowDataMethod
- 2D CNNs: trainImageMethod
Each of the methods take the classifier name as a string.
Examples:
matlab
trainRowFeatureMethod("AdaBoost");
trainRowDataMethod("CNN1d5Layer");
trainImageMethod("CNN2d3Layer");
Testing 📁
Using slurm
Again, if you have access to a slurm cluster, all the training and testing code can be launched using [`run-row-methods.sh`](slurm/run-row.methods.sh) and [`run-image-methods.sh`](slurm/run-image-methods.sh).Similar to the hyperparameter tuning, there are three different testing functions, one for each of the algorithm types:
- feature engineering methods: testRowFeatureMethod
- 1D CNNs: testRowDataMethod
- 2D CNNs: testImageMethod
Each of the methods take the classifier name as a string.
Examples:
matlab
testRowFeatureMethod("AdaBoost");
testRowDataMethod("CNN1d5Layer");
testImageMethod("CNN2d3Layer");
See the train*.slurm scripts in the slurm folder to see the function calls for each method.
Changepoint detection 📁
There are two different changepoint detection methods (gfpop and MATLAB's findchangepts); for each method, there are three different procedures: analyzing the rows, analyzing the columns, or analyzing both the rows and columns.
Using slurm
If you have access to a slurm cluster, you can run all the changepoint algorithms with [`run-changepoint-methods.sh`](slurm/run-changepoint-methods.sh). The `gfpop` mex wrapper still needs to be [compiled](#gfpop) before launching the slurm jobs.MATLAB findchangepts
findchangepts requires MATLAB's Signal Processing Toolbox. To run the algorithms, the data must be prepared already.
The row, column, and "both" algorithms are run with the following scripts:
- rows: matlabChptsRows
- columns: matlabChptsCols
- both: matlabChptsBoth
gfpop 📁
Before running the gfpop algorithm, you must compile the mex file, gfpop_mex.cpp. This can be done by going to the changepoint-detection/gfpop folder and running mex:
bash
cd changepoint-detection/gfpop
mex gfpop_mex.cpp
Once the mex file is compiled, you can run the row, column, and "both" algorithms:
- rows: gfpopRows
- columns: gfpopCols
- both: gfpopBoth
Analyzing results
Once all the scripts have been run, the results can be analyzed and collected by running changepointAnalysis.m.
Figures 📁
For information on recreating the figures, please see the figures folder.
Citation
You can cite this code as follows:
@software{trevor_vannoy_2023_10055810,
author = {Trevor C. Vannoy and
Nathaniel B. Sweeney and
Bradley M. Whitaker},
title = {{BMW-lab-MSU/insect-detection-remote-sensing-mdpi
}},
month = oct,
year = 2023,
publisher = {Zenodo},
doi = {10.5281/zenodo.10055809},
url = {https://zenodo.org/doi/10.5281/zenodo.10055809}
}
License
All code in this repository, except the gfpop code, is licensed under the BSD 3-Clause License. gfpop is licensed under the MIT License.
Owner
- Name: BMW Lab @ MSU
- Login: BMW-lab-MSU
- Kind: organization
- Location: Montana State University
- Repositories: 7
- Profile: https://github.com/BMW-lab-MSU