https://github.com/denajgibbon/vietnam-gunshots
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary
Last synced: 5 months ago
·
JSON representation
Repository
Basic Info
- Host: GitHub
- Owner: DenaJGibbon
- Language: R
- Default Branch: main
- Size: 808 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Created over 2 years ago
· Last pushed over 1 year ago
Metadata Files
Readme
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
---
title: Gunshot Detection with AlexNet, VGG16, and ResNet18 using 'torch for R'
---
This repository contains code for training and evaluating a gunshot detection model using the AlexNet, VGG16, and ResNet18 convolutional neural network architecture. The model is trained on the Imagenet dataset and fine-tuned for the specific task of gunshot detection. The code utilizes the 'luz' package in R for deep learning with torch.
This analysis uses the 'gibbonNetR' package, which can be accessed here: https://github.com/DenaJGibbon/gibbonNetR.
THe full dataset including .wav files that correspond to images used in the present analysis can be found on Zenodo
## Dataset
The dataset used for training, validation, and testing is obtained from sound clips of annotated PAM data, and includes labeled images of gunshots and noise. The images are preprocessed using various transformations, such as resizing, color jitter, and normalization, to prepare them for input into the model.
```{r, echo=FALSE, message=FALSE}
library(dplyr)
library(torchvision)
library(torch)
batch_size <- 20
# Location of spectrogram images for training
input.data.path <- 'data/imagesvietnam/'
# Create a dataset of images:
train_ds <- image_folder_dataset(
file.path(input.data.path,'train' ), # Path to the image directory
transform = . %>%
torchvision::transform_to_tensor() %>%
torchvision::transform_resize(size = c(224, 224)) %>%
#torchvision::transform_color_jitter() %>%
torchvision::transform_normalize(mean = c(0.485, 0.456, 0.406), std = c(0.229, 0.224, 0.225) # Standard deviation for normalization
)
)
# Create a dataloader from the dataset:
# - This helps in efficiently loading and batching the data.
# - The batch size is user specified, with shuffling enabled and the last incomplete batch is dropped.
train_dl <- dataloader(train_ds, batch_size = batch_size, shuffle = TRUE, drop_last = TRUE)
# Extract the next batch from the dataloader
batch <- train_dl$.iter()$.next()
# Extract the labels for the batch and determine class names
classes <- batch[[2]]
class_names <- ifelse(batch$y==2, 'Noise','Gunshot')
# Convert the batch tensor of images to an array and process them:
# - The image tensor is permuted to change the dimension order.
# - The pixel values of the images are normalized.
images <- as_array(batch[[1]]) %>% aperm(perm = c(1, 3, 4, 2))
mean <- c(0.485, 0.456, 0.406)
std <- c(0.229, 0.224, 0.225)
images <- std * images + mean
images <- images * 255
# Clip the pixel values to lie within [0, 255]
images[images > 255] <- 255
images[images < 0] <- 0
# Set the plotting parameters for a 4x6 grid
par(mfcol = c(4,6), mar = rep(1, 4))
# Visualize the images:
# - Use `purrr` functions to handle arrays.
# - Set the name of each image based on its class.
# - Convert each image to a raster format for plotting.
# - Finally, iterate over each image, plotting it and setting its title.
images %>%
purrr::array_tree(1) %>%
purrr::set_names(class_names) %>%
purrr::map(as.raster, max = 255) %>%
purrr::iwalk(~{plot(.x); title(.y)})
```
## Training
The training process involves fine-tuning the pretrained AlexNet, VGG16, and ResNet18 models by training new classifier layers. The model is set up with a binary cross-entropy loss function and the Adam optimizer. Training can be performed for 1-5 epochs or 20 epochs (or as many as you want) with early stopping enabled to prevent overfitting.
The trained model is saved for future use.
```{r, echo=FALSE, message=FALSE, warning=FALSE}
# Load necessary packages and functions
devtools::load_all("/Users/denaclink/Desktop/RStudioProjects/gibbonNetR")
```
```{r, echo=TRUE, message=FALSE,warning=FALSE}
# Location of spectrogram images for training
input.data.path <- 'data/imagesvietnamunbalanced/'
# Location of spectrogram images for testing
test_data_path <- '/Users/denaclink/Desktop/RStudioProjects/Vietnam-Gunshots/data/imagesbelize/train/'
# Training data folder short
trainingfolder.short <- 'imagesvietnamunbalanced'
# Number of epochs to include
epoch.iterations <- c(1)
# Train the models specifying different architectures
architectures <- c('alexnet')
freeze.param <- c(FALSE)
for (a in 1:length(architectures)) {
for (b in 1:length(freeze.param)) {
gibbonNetR::train_CNN_binary(
input.data.path = input.data.path,
noise.weight = 0.25,
architecture = architectures[a],
save.model = TRUE,
learning_rate = 0.001,
test.data = test_data_path,
unfreeze.param = freeze.param[b],
# FALSE means the features are frozen
epoch.iterations = epoch.iterations,
list.thresholds = seq(0, 1, .1),
early.stop = "yes",
output.base.path = "model_output_testonbelize/",
trainingfolder = trainingfolder.short,
positive.class = "gunshot",
negative.class = "noise"
)
}
}
```
## Evaluation
To evaluate the performance of the trained model, the test dataset is used. The images in the test dataset are preprocessed in the same way as during training. The model predicts the probability of an image belonging to the positive class (gunshot) and calculates the F1 score as the evaluation metric.
```{r, echo=TRUE, message=FALSE, warning=FALSE}
performancetables.dir.multi.testbelize <- '/Users/denaclink/Desktop/RStudioProjects/Vietnam-Gunshots/model_output_testonbelize/_imagesvietnamunbalanced_binary_unfrozen_FALSE_/performance_tables'
PerformanceOutputmulti.testbelize <- gibbonNetR::get_best_performance(performancetables.dir=performancetables.dir.multi.testbelize,
class='gunshot',
model.type = "binary",
Thresh.val = 0.1)
PerformanceOutputmulti.testbelize$f1_plot
PerformanceOutputmulti.testbelize$best_f1$F1
PerformanceOutputmulti.testbelize$best_auc$AUC
```
## Usage
To use the trained model for predictions without the 'gibbonNetR' package, load the saved model using the 'luz_load' function. Then, preprocess the input image(s) in the same way as during training and pass them to the model for prediction. The output will be the predicted class label or probability.
```{r}
# Load libraries
library(luz)
library(torch)
library(torchvision)
library(torchdatasets)
library(stringr)
library(caret)
# Load the saved model
modelAlexnetGunshot <- luz_load("_train_1_modelAlexnet.pt")
# Set the input directory for the test images
test.input <- 'data/images/test'
# Get the list of image files
imageFileShort <- list.files(test.input, recursive = TRUE, full.names = FALSE)
# Identify just the class from folder structure
ActualClass <- str_split_fixed( imageFileShort,pattern = '/',n=2)[,1]
# Create a dataset from the image folder
test_ds <- image_folder_dataset(
file.path(test.input),
transform = . %>%
torchvision::transform_to_tensor() %>% # Convert images to tensors
torchvision::transform_resize(size = c(224, 224)) %>% # Resize images to 224x224 pixels
torchvision::transform_normalize(mean = c(0.485, 0.456, 0.406), std = c(0.229, 0.224, 0.225)), # Normalize the image tensor
target_transform = function(x) as.double(x) - 1 # Transform the target variable to be zero-based
)
# Get the number of test files
nfiles <- test_ds$.length()
# Create a dataloader for the test dataset with a specified batch size
test_dl <- dataloader(test_ds, batch_size = nfiles)
# Use the Alexnet model to predict the test images
alexnetPred <- predict(modelAlexnetGunshot, test_dl)
# Apply the sigmoid function to the predicted values
alexnetProb <- torch_sigmoid(alexnetPred)
# Convert the predicted probabilities to an R array and subtract from 1
alexnetProb <- 1 - as_array(torch_tensor(alexnetProb, device = 'cpu'))
# Classify the images as "gunshot" or "noise" based on a probability threshold
alexnetPredictedClass <- ifelse((alexnetProb) > 0.85, "gunshot", "noise")
# Create a confusion matrix
AlexnetPerf <- caret::confusionMatrix(as.factor(alexnetPredictedClass),as.factor(ActualClass), mode='everything')
print(AlexnetPerf$table)
```
## Acknowledgments
This project was inspired by the work of Keydana (2023) on image classification with transfer learning. Special thanks to the 'torch' package authors for providing the framework for deep learning in R with PyTorch.
## References
Keydana, Sigrid. Deep Learning and Scientific Computing with R torch. CRC Press, 2023. https://skeydan.github.io/Deep-Learning-and-Scientific-Computing-with-R-torch/
Owner
- Name: Dena J. Clink
- Login: DenaJGibbon
- Kind: user
- Company: K. Lisa Yang Center for Conservation Bioacoustics
- Website: www.denaclink.com
- Twitter: BorneanGibbons
- Repositories: 4
- Profile: https://github.com/DenaJGibbon
I am a biological anthropologist, bioacoustician, and avid R user. I use innovative bioacoustics techniques to answer evolutionary questions.