https://github.com/denajgibbon/vietnam-gunshots

Last synced: 5 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: DenaJGibbon
Language: R
Default Branch: main
Size: 808 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

---
title: Gunshot Detection with AlexNet, VGG16, and ResNet18 using 'torch for R'
---


This repository contains code for training and evaluating a gunshot detection model using the AlexNet, VGG16, and ResNet18 convolutional neural network architecture. The model is trained on the Imagenet dataset and fine-tuned for the specific task of gunshot detection. The code utilizes the 'luz' package in R for deep learning with torch.

This analysis uses the 'gibbonNetR' package, which can be accessed here: https://github.com/DenaJGibbon/gibbonNetR.

THe full dataset including .wav files that correspond to images used in the present analysis can be found on Zenodo 

## Dataset

The dataset used for training, validation, and testing is obtained from sound clips of annotated PAM data, and includes labeled images of gunshots and noise. The images are preprocessed using various transformations, such as resizing, color jitter, and normalization, to prepare them for input into the model.

```{r, echo=FALSE, message=FALSE}
library(dplyr)
library(torchvision)
library(torch)

batch_size <- 20
# Location of spectrogram images for training
input.data.path <- 'data/imagesvietnam/'

# Create a dataset of images:
train_ds <- image_folder_dataset(
  file.path(input.data.path,'train' ),   # Path to the image directory
  transform = . %>%
    torchvision::transform_to_tensor() %>%
    torchvision::transform_resize(size = c(224, 224)) %>%
    #torchvision::transform_color_jitter() %>%
    torchvision::transform_normalize(mean = c(0.485, 0.456, 0.406), std = c(0.229, 0.224, 0.225)      # Standard deviation for normalization
    )
)

# Create a dataloader from the dataset:
# - This helps in efficiently loading and batching the data.
# - The batch size is user specified, with shuffling enabled and the last incomplete batch is dropped.
train_dl <- dataloader(train_ds, batch_size = batch_size, shuffle = TRUE, drop_last = TRUE)

# Extract the next batch from the dataloader
batch <- train_dl$.iter()$.next()

# Extract the labels for the batch and determine class names
classes <- batch[[2]]
class_names <- ifelse(batch$y==2, 'Noise','Gunshot')

# Convert the batch tensor of images to an array and process them:
# - The image tensor is permuted to change the dimension order.
# - The pixel values of the images are normalized.
images <- as_array(batch[[1]]) %>% aperm(perm = c(1, 3, 4, 2))
mean <- c(0.485, 0.456, 0.406)
std <- c(0.229, 0.224, 0.225)
images <- std * images + mean
images <- images * 255
# Clip the pixel values to lie within [0, 255]
images[images > 255] <- 255
images[images < 0] <- 0

# Set the plotting parameters for a 4x6 grid
par(mfcol = c(4,6), mar = rep(1, 4))

# Visualize the images:
# - Use `purrr` functions to handle arrays.
# - Set the name of each image based on its class.
# - Convert each image to a raster format for plotting.
# - Finally, iterate over each image, plotting it and setting its title.
images %>%
  purrr::array_tree(1) %>%
  purrr::set_names(class_names) %>%
  purrr::map(as.raster, max = 255) %>%
  purrr::iwalk(~{plot(.x); title(.y)})

```


## Training

The training process involves fine-tuning the pretrained AlexNet, VGG16, and ResNet18 models by training new classifier layers. The model is set up with a binary cross-entropy loss function and the Adam optimizer. Training can be performed for 1-5 epochs or 20 epochs (or as many as you want) with early stopping enabled to prevent overfitting. 

The trained model is saved for future use.

```{r, echo=FALSE, message=FALSE, warning=FALSE}
# Load necessary packages and functions
devtools::load_all("/Users/denaclink/Desktop/RStudioProjects/gibbonNetR")
```

```{r, echo=TRUE, message=FALSE,warning=FALSE}

# Location of spectrogram images for training
input.data.path <-  'data/imagesvietnamunbalanced/'

# Location of spectrogram images for testing
test_data_path <- '/Users/denaclink/Desktop/RStudioProjects/Vietnam-Gunshots/data/imagesbelize/train/'

# Training data folder short
trainingfolder.short <- 'imagesvietnamunbalanced'

# Number of epochs to include
epoch.iterations <- c(1)

# Train the models specifying different architectures
architectures <- c('alexnet')
freeze.param <- c(FALSE)
for (a in 1:length(architectures)) {
  for (b in 1:length(freeze.param)) {
    gibbonNetR::train_CNN_binary(
      input.data.path = input.data.path,
      noise.weight = 0.25,
      architecture = architectures[a],
      save.model = TRUE,
      learning_rate = 0.001,
      test.data = test_data_path,
      unfreeze.param = freeze.param[b],
      # FALSE means the features are frozen
      epoch.iterations = epoch.iterations,
      list.thresholds = seq(0, 1, .1),
      early.stop = "yes",
      output.base.path = "model_output_testonbelize/",
      trainingfolder = trainingfolder.short,
      positive.class = "gunshot",
      negative.class = "noise"
    )
    
  }
}

```


## Evaluation

To evaluate the performance of the trained model, the test dataset is used. The images in the test dataset are preprocessed in the same way as during training. The model predicts the probability of an image belonging to the positive class (gunshot) and calculates the F1 score as the evaluation metric. 

```{r, echo=TRUE, message=FALSE, warning=FALSE}
performancetables.dir.multi.testbelize <- '/Users/denaclink/Desktop/RStudioProjects/Vietnam-Gunshots/model_output_testonbelize/_imagesvietnamunbalanced_binary_unfrozen_FALSE_/performance_tables'
PerformanceOutputmulti.testbelize <- gibbonNetR::get_best_performance(performancetables.dir=performancetables.dir.multi.testbelize,
                                                                           class='gunshot',
                                                                           model.type = "binary",
                                                                           Thresh.val = 0.1)

PerformanceOutputmulti.testbelize$f1_plot
PerformanceOutputmulti.testbelize$best_f1$F1
PerformanceOutputmulti.testbelize$best_auc$AUC
```

## Usage

To use the trained model for predictions without the 'gibbonNetR' package, load the saved model using the 'luz_load' function. Then, preprocess the input image(s) in the same way as during training and pass them to the model for prediction. The output will be the predicted class label or probability.

```{r}
# Load libraries
library(luz)
library(torch)
library(torchvision)
library(torchdatasets)
library(stringr)
library(caret)

# Load the saved model
modelAlexnetGunshot <- luz_load("_train_1_modelAlexnet.pt")

# Set the input directory for the test images
test.input <- 'data/images/test'

# Get the list of image files
imageFileShort <- list.files(test.input, recursive = TRUE, full.names = FALSE)

# Identify just the class from folder structure
ActualClass <- str_split_fixed( imageFileShort,pattern = '/',n=2)[,1]

# Create a dataset from the image folder
test_ds <- image_folder_dataset(
  file.path(test.input),
  transform = . %>%
    torchvision::transform_to_tensor() %>%  # Convert images to tensors
    torchvision::transform_resize(size = c(224, 224)) %>%  # Resize images to 224x224 pixels
    torchvision::transform_normalize(mean = c(0.485, 0.456, 0.406), std = c(0.229, 0.224, 0.225)),  # Normalize the image tensor
  target_transform = function(x) as.double(x) - 1  # Transform the target variable to be zero-based
)

# Get the number of test files
nfiles <- test_ds$.length()

# Create a dataloader for the test dataset with a specified batch size
test_dl <- dataloader(test_ds, batch_size = nfiles)

# Use the Alexnet model to predict the test images
alexnetPred <- predict(modelAlexnetGunshot, test_dl)

# Apply the sigmoid function to the predicted values
alexnetProb <- torch_sigmoid(alexnetPred)

# Convert the predicted probabilities to an R array and subtract from 1
alexnetProb <- 1 - as_array(torch_tensor(alexnetProb, device = 'cpu'))

# Classify the images as "gunshot" or "noise" based on a probability threshold
alexnetPredictedClass <- ifelse((alexnetProb) > 0.85, "gunshot", "noise")

# Create a confusion matrix
AlexnetPerf <- caret::confusionMatrix(as.factor(alexnetPredictedClass),as.factor(ActualClass), mode='everything')

print(AlexnetPerf$table)

```

## Acknowledgments
This project was inspired by the work of Keydana (2023) on image classification with transfer learning. Special thanks to the 'torch' package authors for providing the framework for deep learning in R with PyTorch.

## References
Keydana, Sigrid. Deep Learning and Scientific Computing with R torch. CRC Press, 2023. https://skeydan.github.io/Deep-Learning-and-Scientific-Computing-with-R-torch/

Owner

Name: Dena J. Clink
Login: DenaJGibbon
Kind: user
Company: K. Lisa Yang Center for Conservation Bioacoustics

Website: www.denaclink.com
Twitter: BorneanGibbons
Repositories: 4
Profile: https://github.com/DenaJGibbon

I am a biological anthropologist, bioacoustician, and avid R user. I use innovative bioacoustics techniques to answer evolutionary questions.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/denajgibbon/vietnam-gunshots

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.Rmd

Owner

GitHub Events

Total

Last Year