od-virat
This repository provides the official implementation of "OD-VIRAT: A Large-Scale Benchmark for Object Detection in Realistic Surveillance Environments"
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: scholar.google -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary
Repository
This repository provides the official implementation of "OD-VIRAT: A Large-Scale Benchmark for Object Detection in Realistic Surveillance Environments"
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
OD-VIRAT: A Large-Scale Benchmark for Object Detection in Realistic Surveillance Environments
Hayat Ullah, Abbas Khan, Arslan Munir
Abstract: Realistic human surveillance datasets are crucial for training and evaluating computer vision models under real-world conditions, facilitating the development of robust algorithms for human and human-interacting object detection in complex environments. These datasets need to offer diverse and challenging data to enable a comprehensive assessment of model performance and the creation of more reliable surveillance systems for public safety. To this end, we present two visual object detection benchmarks named OD-VIRAT Large and OD-VIRAT Tiny, aiming at advancing visual understanding tasks in surveillance imagery. The video sequences in both benchmarks cover 10 different scenes of human surveillance recorded from significant height and distance. The proposed benchmarks offer rich annotations of bounding boxes and categories, where OD-VIRAT Large has 8.7 million annotated instances in 599,996 images and OD-VIRAT Tiny has 288,901 annotated instances in 19,860 images. This work also focuses on benchmarking state-of-the-art object detection architectures, including RETMDET, YOLOX, RetinaNet, DETR, and Deformable-DETR on this object detection-specific variant of VIRAT dataset. To the best of our knowledge, it is the first work to examine the performance of these recently published state-of-the-art object detection architectures on realistic surveillance imagery under challenging conditions such as complex backgrounds, occluded objects, and small-scale objects. The proposed benchmarking and experimental settings will help in providing insights concerning the performance of selected object detection models and set the base for developing more efficient and robust object detection architectures.
Model Complexity vs Accuracy (mAP) trade-off comparison: We evaluate the performance of five main-stream object detection architectures on OD-VIRAT Tiny dataset and compared the obtained mAP values against model complexities (# of parameters). The Deformable-DETR architecture with resnet50 backbone outperform other counterparts by obtaining the best mAP value.
Table of Contents
<!-- * News --> * Visualization * Environment Setup * Dataset Detail and Data Preparation * Training * Evaluation * Citation * Acknowledgements <!--te-->
Visualization: Visual Illustration of Each Scene
Environment Setup
Please follow INSTALL.md for preparing the environement and installation of prerequisite packages.
Dataset Detail and Data Preparation
Please follow DATA.md for dataset details and data preparation.
Training
To train a specific model on the OD-VIRAT Tiny dataset, run the following command:
bash
sbatch --mem=30G --time=40:00:00 --gres=gpu:1 --nodes=1 trainer.sh config
- sbatch: Submits the job to the SLURM scheduler.
- --mem=30G: Requests 30 GB of memory for the job.
- --time=40:00:00: Sets a maximum job run time of 40 hours.
- --gres=gpu:1: Requests 1 GPU for the job, one can increase the number of GPU (2 or more) based on their computational and memory requirements.
- --nodes=1: Allocates 1 node for the job.
- trainer.sh config: Runs the trainer.sh script with config as an argument.
Or
bash
--launcher slurm --mem=30G --time=40:00:00 --gres=gpu:1 --nodes=1 trainer.sh config
- --launcher: Provide the job scheduler (i.e., pytorch, slurm, and mpi). In our case, we used slurm for job sumission.
- --mem=30G: Requests 30 GB of memory for the job.
- --time=40:00:00: Sets a maximum job run time of 40 hours.
- --gres=gpu:1: Requests 1 GPU for the job, one can increase the number of GPU (2 or more) based on their computational and memory requirements.
- --nodes=1: Allocates 1 node for the job.
- trainer.sh config: Runs the trainer.sh script with config as an argument.
For example, to train the Deformable-Detr with ResNet50 backbone on OD-VIRAT Tiny using single GPU, run the following command:
bash
sbatch --mem=30G --time=40:00:00 --gres=gpu:1 --nodes=1 trainer.sh configs/deformable_detr/deformable-detr-refine-twostage_r50_16xb2-50e_coco_virat_bs64.py
- configs/deformable_detr/deformable-detr-refine-twostage_r50_16xb2-50e_coco_virat_bs64.py: the python file containing the model-specificaiton, data loaders, and training protocols. For instance, in this case the model set to be trained is Deformable Detr (two-stage refinement variant) with ResNet50 backbone with batch size 64 for 50 epochs on OD-VIRAT Tiny dataset.
- trainer.sh: is a bash file that takes configs/deformable_detr/deformable-detr-refine-twostage_r50_16xb2-50e_coco_virat_bs64.py as an input argument and pass it to tools/train.py file, as follows:
```
!/bin/bash
echo $config
eval "$(conda shell.bash hook)" # Initialize the shell to use Conda conda info --envs # list all conda envs available conda activate 'env_name'
time python tools/train.py $config
The$configcontainsdeformable-detr-refine-twostager5016xb2-50ecocovirat_bs64.pywhich serves as an input argument totools/train.py``` file.
Evaluation
To test/evaluate a pre-trained model on the test set of OD-VIRAT Tiny dataset, run the following command:
bash
sbatch --mem=30G --time=01:00:00 --gres=gpu:1 --nodes=1 eval.sh config
The rest of the arugments are same except eval.sh, taking config (containing model evaluation configurations) as an input argument. The eval.sh evaluate the model using the configurations given in the input configuration file (i.e., evaluation protocols, path to test data, evaluation metrics, and the checkpoints of the pre-trained model.)
For example, to evaluate the pre-trained Deformable Detr (two-stage refinement variant) on the test set of OD-VIRAT Tiny dataset, run the following command:
sbatch --mem=30G --time=01:00:00 --gres=gpu:1 --nodes=1 eval.sh configs/deformable_detr/deformable-detr-refine-twostage_r50_16xb2-50e_coco_virat_bs64_eval.py
The deformable-detr-refine-twostage_r50_16xb2-50e_coco_virat_bs64_eval.py provides the configuration as mentioned above for the evaluation of deformable-detr-refine-twostage_r50 model on OD-VIRAT Tiny dataset, as follows:
```
!/bin/bash
echo $config
eval "$(conda shell.bash hook)" # Initialize the shell to use Conda conda info --envs # list all conda envs available conda activate 'env_name'
time python tools/test.py $config
The$configcontainsdeformable-detr-refine-twostager5016xb2-50ecocoviratbs64eval.pywhich serves as an input argument totools/test.py``` file.
Model Training Configurations :gear:
| Configuration | RTMDET | YOLOX | RetinaNet | DETR | Deformable-DETR |
| ------------- | :---: | :---: | :---: | :---: | :---: |
| Optimizer | AdamW | SGD | SGD | AdamW | AdamW |
| Base Learning Rate | 0.004 | 0.01 | 0.01 | 0.0001 | 0.0002 |
| Weight Decay | 0.05 | 0.0005 | 0.0001 | 0.0001 | 0.0001 |
| Batch Size | 32/64/128 | 32/64/128 | 32/64/128 | 32/64/128 | 32/64/128 |
| Optimizer Momentum | ✘ | 0.9 | 0.9 | ✘ | ✘ |
| Parameters Scheduler | CosineAnnealingLR | CosineAnnealingLR | CosineAnnealingLR | CosineAnnealingLR | CosineAnnealingLR |
| Training Epochs | 50 | 50 | 50 | 50 | 50 |
Models Convergence Visualization
Visual Results
Visual comparative analysis of selected object detection models on five test images. (a) RTMDET, (b) YOLOX, (c) RetinaNet, (d) DETR, and (e) Deformable-DETR.
Model Complexity vs Accuracy (mAP) trade-off comparison: We evaluate the performance of five main-stream object detection architectures on OD-VIRAT Tiny dataset and compared the obtained mAP values against model complexities (# of parameters). The Deformable-DETR architecture with resnet50 backbone outperform other counterparts by obtaining the best mAP value.
The obtained quantitative results in terms of 𝑚𝐴𝑃, 𝑚𝐴𝑃50, 𝑚𝐴𝑃75, 𝑚𝐴𝑃𝑆 , 𝑚𝐴𝑃𝑀 , and 𝑚𝐴𝑃𝐿 on test images perturbed with Gaussian Noise, Motion Blur, Snow, and Elastic Transform and five different level of perturbation severity (i.e., s = [1:1:5]).
Citation
Will be updated upon publication.
Contact
If you have any questions, feel free to open an issue on this repository or reach out at hullah2024@fau.edu.
Acknowledgements
Our code is based on MMDetection repository. We thank the authors for releasing their code. If you use our code, please consider citing these works as well.
Owner
- Name: Hayat Ullah
- Login: hayatkhan8660-maker
- Kind: user
- Location: Kansas State University, Manhattan, KS, USA
- Company: Kansas State University
- Repositories: 1
- Profile: https://github.com/hayatkhan8660-maker
PhD | Research Assistant at ISCAAS, Kansas State University. | Applied Computer Vision Engineer
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - name: "MMDetection Contributors" title: "OpenMMLab Detection Toolbox and Benchmark" date-released: 2018-08-22 url: "https://github.com/open-mmlab/mmdetection" license: Apache-2.0
GitHub Events
Total
- Issues event: 2
- Watch event: 1
- Member event: 1
- Push event: 104
- Pull request event: 4
- Create event: 4
Last Year
- Issues event: 2
- Watch event: 1
- Member event: 1
- Push event: 104
- Pull request event: 4
- Create event: 4
Dependencies
- asynctest *
- cityscapesscripts *
- codecov *
- cython *
- emoji *
- fairscale *
- flake8 *
- imagecorruptions *
- instaboostfast *
- interrogate *
- isort ==4.3.21
- jsonlines *
- kwarray *
- matplotlib *
- memory_profiler *
- mmcv <2.2.0,>=2.0.0rc4
- mmengine <1.0.0,>=0.7.1
- mmpretrain *
- mmtrack *
- motmetrics *
- nltk *
- numpy <1.24.0
- numpy *
- onnx ==1.7.0
- onnxruntime >=1.8.0
- parameterized *
- prettytable *
- protobuf <=3.20.1
- psutil *
- pycocoevalcap *
- pycocotools *
- pytest *
- scikit-learn *
- scipy *
- seaborn *
- shapely *
- six *
- terminaltables *
- tqdm *
- transformers *
- ubelt *
- xdoctest >=0.10.0
- yapf *
- albumentations >=0.3.2
- cython *
- numpy *
- docutils ==0.16.0
- myst-parser *
- sphinx ==4.0.2
- sphinx-copybutton *
- sphinx_markdown_tables *
- sphinx_rtd_theme ==0.5.2
- urllib3 <2.0.0
- mmcv >=2.0.0rc4,<2.2.0
- mmengine >=0.7.1,<1.0.0
- fairscale *
- jsonlines *
- nltk *
- pycocoevalcap *
- transformers *
- cityscapesscripts *
- emoji *
- fairscale *
- imagecorruptions *
- scikit-learn *
- mmcv >=2.0.0rc4,<2.2.0
- mmengine >=0.7.1,<1.0.0
- scipy *
- torch *
- torchvision *
- urllib3 <2.0.0
- matplotlib *
- numpy *
- pycocotools *
- scipy *
- shapely *
- six *
- terminaltables *
- tqdm *
- asynctest * test
- cityscapesscripts * test
- codecov * test
- flake8 * test
- imagecorruptions * test
- instaboostfast * test
- interrogate * test
- isort ==4.3.21 test
- kwarray * test
- memory_profiler * test
- nltk * test
- onnx ==1.7.0 test
- onnxruntime >=1.8.0 test
- parameterized * test
- prettytable * test
- protobuf <=3.20.1 test
- psutil * test
- pytest * test
- transformers * test
- ubelt * test
- xdoctest >=0.10.0 test
- yapf * test
- mmpretrain *
- motmetrics *
- numpy <1.24.0
- scikit-learn *
- seaborn *
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build