https://github.com/beegass/yolov1-real-time-obj-detection
Real-time object detection using YoloV1 in PyTorch on video and webcam feed.
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary
Repository
Real-time object detection using YoloV1 in PyTorch on video and webcam feed.
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
https://github.com/BeeGass/yolov1-real-time-obj-detection/blob/main/
# Yolo: You Only Look Once Real-Time Object Detection (V1) **General:**
This repo contains a reimplementation of the original Yolo: [You Only Look Once: Unified, Real-Time Object Detection](https://arxiv.org/abs/1506.02640) paper by Joseph Redmon using PyTorch. A short demo of our detection system can be seen in Fig. 1. The full demonstration can be found [here](https://www.youtube.com/watch?v=Q30_ScFp8us).**Example Dictionary Structure**
![]()
Fig.1 - Real-time inference using YoloV1. **Getting started:**View dictionary structure
``` . application # Real time inference tools __init__.py yolov1_watches_you.py # YoloV1 inference on webcam yolov1_watches_youtube.py # YoloV1 inference on an .mp4 video file in `video/` cpts # Weights as checkpoint .cpt files vgg19bn_adj_lr_yolov1.cpt # Pretrained YoloV1 utilizing Vgg19 backbone resnet18_adj_lr_yolov1.cpt # Pretrained YoloV1 utilizing Resnet18 backbone resnet50_adj_lr_yolov1.cpt # Pretrained YoloV1 utilizing Resnet50 backbone figures # Figures and graphs .... loss # Custom PyTorch loss __init__.py yolov1_loss.py models # Pytorch models __init__.py darknet.py yolov1net_darknet.py # Original YoloV1 backbone (not supported: no backbone weights available) yolov1net_resnet18.py # Resnet18 pre-trained backbone yolov1net_resnet50.py # Resnet50 pre-trained backbone yolov1net_vgg19bn.py # Vgg19 with batchnormalization pre-trained backbone results # Result textfiles .... train # Training files __init__.py train_darknet.py train_yolov1.py utils # Tools and utilities __init__.py custom_transform.py # Image transformation/augmentation darknet_utils.py dataset.py figs.py. # Create figures generate_csv.py # Create training and testing csv files get_data.sh # Fetch data and assign into appropriate folder structure get_data_macos.sh get_inference_speed.py # Get inference speed iou_map_tester.py # mAP tester voc_label.py yolov1_utils.py video youtube_video.mp4 # .mp4 video from youtube yolov1_watches_youtube.mp4 # Result of `yolov1_watches_youtube.py` requierments.txt # Python libraries setup.py terminal.ipynb # If you want to run experiments on google collab LICENSE README.md ```
In order to get started first `cd` into the `./yolov1-real-time-obj-detection` dictionary and run the following lines: ``` virtualenv -p python3 venv source venv/bin/activate pip install -e . ``` Depending on what libraries you may already have, you may wish to `pip install -r requirements.txt`. To train from scratch, the PASCAL VOC 2007 and 2012 data-set is required. You can either manually download the data from the [PASCAL VOC homepage](http://host.robots.ox.ac.uk/pascal/VOC/) or simply call the following shell file: `utils/get_data.sh`, which will automatically download and sort the data into the approriate folders and format for training. If you are on mac use the `utils/get_data_macos.sh` file. You may need to ensure that the shell file is executable by calling `chmod +x get_data.sh` and then executing it `./get_data.sh`. Note that the for the PASCAL VOC 2012 data-set, test data is only available on the PASCAL test server and therefore not publicly available for download. **Training:**
To train the model simply call `python train/train_yolov1.py` from terminal. Select one of the supported pre-trained models to be initalised as a backbone for training by setting one of the following backbone tags to `True` and all others to `False`: 1) `use_vgg19bn_backbone`, 2) `use_resnet18_backbone`, 3) and `use_resnet50_backbone` and 4) `use_original_darknet_backbone`. Note that as there are no pretrained weights available for the darknet weights in pytorch, the original backbone is currently not supported. If anyone has such weights or the GPU load available to train these from scratch on ImageNet, please feel free to contact me. The darknet training files for ImageNet data have been included in this repo for this purpose and should only requiere some small adjustments. **Results**
Loss and mean average precision (mAP) values are computed after every epoch and can be seen from the console. To obtain results regarding inference speed, call `python utils/inference_speed.py`. After training and obtaining the inference speed, plots can be created by calling `python utils/figs.py`, which are stored in the `figures/` folder. The results for training and test loss in addition to mAP values can be seen in Fig.2 for Vgg19 with batch normalisation, in Fig.3 for Resnet18 and Fig.4 for Resnet50. adjustments.
A model comparison between test mAP and inference speeed can be seen in Fig.5 and Fig.6 respectively. See Table.1 for exact mAP, FPS values per model.
![]()
Fig.2 - Vgg19bn Yolov1: Training, test loss and mAP as a function of epochs.
![]()
Fig.3 - Resnet18 Yolov1: Training, test loss and mAP as a function of epochs.
![]()
Fig.4 - Resnet50 Yolov1: Training, test loss and mAP as a function of epochs.
![]()
Fig.4 - Tiny Yolov118 Resnet: Training, test loss and mAP as a function of epochs.
![]()
Fig.5 - Test mAP across different YoloV1 backbones. **Real time object detection (GPU)**
![]()
Fig.6 - Inference speed across different YoloV1 backbones.
In order to run YoloV1 in real-time on a video or webcam in real-time, please if not trained from scratch download one of the pretrained weights from the Table 1. Make sure that at least one of the pretrained checkpoint `.cpt` files is within the checkpoints `cpts` folder. If you want to do real time inference on a video, move the video file (preferably .mp4) into the `./video` folder. Then specify both 1) which pre-trained model to use and 2) path to the video in `application/yolov1_watches_youtube.py` by setting the appropriate tag to `True`. This will open up a window and perform object detection in real time. If you wish to perform object detection on a webcam call the `application/yolov1_watches_you.py`, which will open up a window of your camera stream and perform object detecton. **Real time object detection (CPU)**
To run real-time object detection from your webcam feed on CPU only machines, in `application/yolov1_watches_you.py` change:
`checkpoint = torch.load(path_cpt_file)` to `checkpoint = torch.load(path_cpt_file, map_location=torch.device('cpu'))`. **Pretrained weights**
Backbone | Train mAP | Test mAP | FPS | Link | | :--- | :---: | :---: | :---: | ---: | | Vgg19bn | 66.12% | 44.01% | 233 | [Link](https://drive.google.com/file/d/1-5vqoN2QxRqvFQ_KBZxD2q3hi5dBwcmq/view?usp=sharing) | | Resnet18 | 68.39% | 44.29% | 212 | [Link](https://drive.google.com/file/d/1VsDFNMDYBWSy9qFGooMVNo5SFVyYToT0/view?usp=sharing) | | Resnet50 | 69.51% | 49.94% | 96 | [Link](https://drive.google.com/file/d/1-31xnUeXpkb2AHLr9GDw0wlgn9MmUM13/view?usp=sharing) | | Tiny Yolov1 Resnet18 | 32.22% | 19.62% | > 900 | [Link](https://drive.google.com/file/d/1T709LJujimFkAnMePvtI_LzWCi29eTuD/view?usp=sharing) | | Darknet | - | - | - | - | Download the entire `cpts`folder [here](https://drive.google.com/drive/folders/1GDj3jLBWbruhSQ7Gx01cLdkJFYW7kDwj?usp=sharing). **Python files:**
`yolov1net_backbonename.py` : There are 3 pretrained backbones supported: Vgg19 with batch norm `yolov1net_vgg19bn.py`, Resnet18 `yolov1net_resnet18.py` and Resnet50 `yolov1net_resnet50.py`. While the original darknet backbone is included `yolov1net_darknet.py`, there are no pretrained PyTorch weights available for this backbone. Methods that convert the original darknet weights from [Joseph Redmon's website](https://pjreddie.com/darknet/imagenet/) do not support conversion of this particular backbone. If anyone has such weights or the GPU load available to train these from scratch on ImageNet, please feel free to contact me. The darknet training files for ImageNet data have been included in this repo for this purpose and should only requiere small adjustments. `train_yolov1.py` : performs training and testing procedure, giving progress updates after each epoch for both training and test loss in addition to the mean average precision metric. `yolov1_loss.py` : defines the yolov1 loss as a custom PyTorch nn.module. `custom_transform.py` : defines the dataaugmentations applied to yolov1 as per the original paper by Joseph Redmon. `utils.py` : defines a series of utility functions, such as computation for the intersection over unions, mean average precision, converting bouding box coordinates relative to the cellboxes and from cellboxes the image. `get_data.sh` : downloads the data and assigns them into the approriate folder structure for training and testing and converts the `train.txt` and `text.txt` to a csv using `generate_csv.py` `get_data_macos.sh` : same as above but supported for unix macos systems. `yolov1_watches_you.py` : performs object detection on webcam stream. `yolov1_watches_youtube.py` : performs object detection on a specified video path. **Acknowledgement:**
I would like to thank Michael Lew, Bryan A. Gass, Jonas Eilers, Jakob Walter and Daniel Klassen for their support, time and thoughts.
Owner
- Name: Bryan
- Login: BeeGass
- Kind: user
- Location: Cambridge, MA
- Company: @USArmyResearchLab
- Website: onlygass.dev
- Twitter: BeeAGass
- Repositories: 14
- Profile: https://github.com/BeeGass
Research Engineer interested in SSMs