yoloft
A code base for the official XS-VID dataset baseline method YOLOFT
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary
Repository
A code base for the official XS-VID dataset baseline method YOLOFT
Basic Info
- Host: GitHub
- Owner: gjhhust
- License: agpl-3.0
- Language: Python
- Default Branch: main
- Size: 1.32 MB
Statistics
- Stars: 13
- Watchers: 1
- Forks: 2
- Open Issues: 6
- Releases: 0
Metadata Files
README.md
YOLOFT: An Extremely Small Video Object Detection Benchmark Baseline
:loudspeaker: Introduction
This is the official implementation of the baseline model for XS-VID benchmark.
[news]: We will soon be releasing XS-VIDv2, incorporating many new videos and scenarios, significantly expanding our dataset! Please stay tuned!
:ferris_wheel: Dependencies
- CUDA 11.7
- Python 3.8
- PyTorch 1.12.1(cu116)
- TorchVision 0.13.1(cu116)
- numpy 1.24.4
:openfilefolder: Datasets
Our work is based on the large-scale extremely small video object detection benchmark XS-VID. Download the dataset(s) from corresponding links below. - [Google drive]:annotations; images(0-3); images(4-5); - [BaiduNetDisk]:annotations and images;
Please choose a download method to download the annotations and all images. Make sure all the split archive files (e.g., images.zip, images.z01, images.z02, etc.) are in the same directory. Use the following command to extract them:
bash
unzip images.zip
unzip annotations.zip
We have released several annotation formats to facilitate subsequent research and use, including COCO, COCOVID, YOLO
🛠️ Install
This repository is build on Ultralytics 8.0.143 which can be installed by running the following scripts. Please ensure that all dependencies have been satisfied before setting up the environment. ``` conda create --name yoloft python=3.8 conda activate yoloft pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113 git clone https://github.com/gjhhust/YOLOFT cd YOLOFT pip install -r requirements.txt pip install -e .
cd ./ultralytics/nn/modules/ops_dcnv3 python setup.py build install
cd ../altcudacorr_sparse python setup.py build install ```
:hourglass: Data preparation
If you want to use a custom video dataset for training tests, it needs to be converted to yolo format for annotation, and the dataset files are organized in the following format:
data_root_dir/ # Root data directory
├── test.txt # List of test data files, each line contains a relative path to an image file
├── train.txt # List of training data files, each line contains a relative path to an image file
├── images/ # Directory containing image files
│ ├── video1/ # Directory for image files of the first video
│ │ ├── 0000000.png # First frame image file of the first video
│ │ └── 0000001.png # Second frame image file of the first video
│ ├── video2/ # Directory for image files of the second video
│ │ └── ... # More image files
│ └── ... # More video directories
└── labels/ # Directory containing label files
├── video1/ # Directory for label files of the first video
│ ├── 0000000.txt # Label file for the first frame of the first video (matches the image file)
│ └── 0000001.txt # Label file for the second frame of the first video (matches the image file)
├── video2/ # Directory for label files of the second video
│ └── ... # More label files
└── ... # More video directories
Note: The name of the image and the name of the label in yolo format must be the same, and the format is frameNumber.png, e.g. "0000001.png and 0000001.txt".
🚀 Training
One training session
python tools/train_yoloft.py
Multiple GPUs please change devices
Repeat training several times for a model configuration yaml file
python tools/XSVID/yoloft_baseline.py
Parameters:
- repeats: repetition
- modelconfigpath: Model yaml
- pretrainmodel: pre-training weight
- datasetconfigpath: Dataset Profiles
- trainingconfig_path: Training hypercamera configurations
Eventually, you will get a log file containing the results of all the repeated experiments, and you can analyze the log file to get the optimal results and the location where they are saved:
python tools/XSVID/analy_log.py path/to/xxxx.log
Comparison test on multiple model configuration yaml files
python tools/XSVID/yoloft_conpresion.py
Parameters:
- repeats: Number of training repetitions for a single model file
- modelconfigdir: Directory of model profiles to be compared
Eventually, we will get all the log files of the comparison experiments and a csv of all the experimental results in runs/logs, which can be further analyzed to get the optimal configuration of the experimental model:
python tools/XSVID/analy_csv.py path/to/xxxx.csv
Here's a brief usage guide for running the script:
📈 Evaluation
python tools/test_yoloft.py
When save_json=True, evaluations in coco format will be output for training and testing, otherwise only Ultralytics' own results evaluations will be output
To evaluate the performance of other models, you can use the eval tool
Predict videos script
This script processes images or videos by loading data from the specified directories, applying a model for predictions, and saving the results and videos in the desired output directory.
Command-line Arguments
image_dir: (Required) Path to the directory containing images or subdirectories of images. If the--modeis set to'one', this should be a directory containing all images. If set to'muti', this should be a directory containing subdirectories, where each subdirectory represents a video.checkpoint: (Required) Path to the model checkpoint that will be used for predictions.--save_dir: (Required) Path to the directory where prediction results and generated videos will be saved.--mode: (Required) Mode of operation. Choices are:'one': Theimage_dircontains all images to be processed.'muti': Theimage_dircontains multiple subdirectories, with each subdirectory corresponding to a different video.
--eval_json: (Optional) Path to the evaluation JSON file for model evaluation.
Example Command
Mode:
one- Description: Use this mode when all image files are directly inside a single directory.
- File Structure Example:
/path/to/video_name/ # Directory to be set as image_dir ├── 000001.png ├── 000002.png ├── 000003.png └── ... - Usage: Set
mode=oneandimage_dir=/path/to/video_name.
bash python tools/predict_yoloft.py /path/to/video_name yoloft-L.pt --save_dir /path/to/save --mode oneMode:
muti- Description: Use this mode when the directory contains multiple subdirectories, each representing a different video.
- File Structure Example:
/path/to/videos_dir/ # Directory to be set as image_dir ├── video_name1/ # Subdirectory for the first video │ ├── 000001.png │ ├── 000002.png │ └── ... ├── video_name2/ # Subdirectory for the second video │ ├── 000001.png │ ├── 000002.png │ └── ... └── ... - Usage: Set
mode=mutiandimage_dir=/path/to/videos_dir.
bash python tools/predict_yoloft.py /path/to/videos_dir yoloft-L.pt --save_dir /path/to/save --mode muti
Convert to onnx
bash
yolo export model=./YOLOFT-L.pt imgsz=1024,1024 format=onnx opset=12
:trophy: Result
Result on XS-VID
| Method | Schedule | Backbone | $AP$ | $AP_{50}$ | $AP_{75}$ | $AP_{eS}$ | $AP_{rS}$ | $AP_{gS}$ | Inference(ms) | |:------------------|:-------------|:-------------|---------:|--------------:|--------------:|--------------:|--------------:|--------------:|-------------------:| | DFF | 1x | R50 | 9.4 | 15 | 10.2 | 0.0 | 0.3 | 3.0 | 20.0 | | DFF | 1x | x101 | 9.6 | 16.9| 9.9 | 0.0 | 0.5 | 4.5 | 25.5 | | FGFA | 1x | R50 | 7.8 | 18.8| 5.0 | 1.1 | 2.0 | 6.1 | 151.0 | | FGFA | 1x | x101 | 12.3 | 18.0| 14.1 | 0.2 | 1.1 | 6.4 | 181.8 | | SELSA | 1x | R50 | 13.6 | 18.1| 15.5 | 0.0 | 2.2 | 8.1 | 88.5 | | SELSA | 1x | x101 | 13.6 | 18.8| 15.8 | 0.0 | 1.7 | 8.3 | 110.0 | | TROI | 1x | R50 | 12.3 | 16.9| 14.0 | 0.0 | 1.3 | 5.6 | 232.0 | | TROI | 1x | x101 | 12.8 | 18.5| 14.7 | 0.0 | 1.3 | 7.6 | 285.7 | | MEGA | 1x | R101 | 7.8 | 18.8| 5.0 | 1.1 | 2.0 | 6.1 | nan | | DiffusionVID | 50e | R101 | 10.6 | 24.3| 8.2 | 2.7 | 5.6 | 9.4 | nan | | TransVOD | 50e | R50 | 21.8 | 39.6| 21.1 | 8.8 | 13.6 | 20.5 | 136.0 | | StreamYOLO | 1x | YOLOX | 33.4 | 47.3| 37.5 | 18.7 | 26.7 | 33.6 | 47.5 | | FCOS | 1x | R50 | 24.9 | 41.3| 24.8 | 7.7 | 17.3 | 22.6 | 31.8 | | ATSS | 1x | R50 | 26.9 | 43.3| 26.8 | 8.4 | 19.2 | 23.9 | 34.9 | | YOLOX-S | 50e | YOLOX | 29.1 | 44.0| 30.4 | 15.0 | 20.0 | 25.6 | 24.0 | | YOLOX-L | 50e | YOLOX | 31.0 | 44.9| 33.8 | 17.4 | 21.7 | 25.6 | 37.4 | | DyHead | 1x | R50 | 23.7 | 39.6| 22.7 | 7.0 | 15.9 | 20.5 | 98.0 | | RepPoints | 1x | R50 | 23.7 | 41.7| 22.8 | 9.1 | 18.6 | 23.9 | 37.8 | | Deformable-DETR | 1x | R50 | 21.3 | 38.0| 21.3 | 11.3 | 13.7 | 18.7 | 52.3 | | Sparse RCNN | 1x | R50 | 21.0 | 34.2| 21.8 | 9.0 | 13.9 | 17.5 | 41.8 | | Cascade RPN | 1x | R50 | 27.0 | 44.5| 26.6 | 13.5 | 19.4 | 22.1 | 45.3 | | CESCE | 15e | nan | 22.6 | 40.1| 21.5 | 10.3 | 16.2 | 21.3 | 31.0 | | CFINet | 1x | R50 | 29.5 | 48.8| 31.0 | 16.6 | 21.8 | 25.1 | 47.1 | | Yolov8-s | 2x | YOLOv8 | 30.0 | 45.3| 32.1 | 17.8 | 24.1 | 27.0 | 14.0 | | Yolov8-L | 2x | YOLOv8 | 33.6 | 48.8| 36.9 | 21.3 | 27.4 | 32.7 | 26.0 | | Yolov9-C | 2x | nan | 31.6 | 47.0| 34.3 | 18.4 | 24.6 | 31.2 | 22.0 | | YOLOFT-S | 2x | YOLOv8 | 32.9 | 49.2| 36.5 | 21.4 | 26.5 | 34.2 | 16.0 | | YOLOFT-L | 2x | YOLOv8 | 36.4 | 52.9| 41.2 | 24.7 | 28.9 | 33.4 | 36.0 |
Result on Visdrone2019 VID(test-dev)
| Method | Schedule | Backbone | $AP$ | $AP_{50}$ | $AP_{75}$ | $AP_{eS}$ | $AP_{rS}$ | $AP_{gS}$ | $AP_{m}$ | $AP_{l}$ | |--------------------|--------------|--------------|---------|---------------|--------------|---------------|---------------|---------------|--------------|--------------| | DFF | 1x | R50 | 5.8 | 12.2 | 4.9 | 0.0 | 0.2 | 1.1 | 6.9 | 12.4 | | DFF | 1x | x101 | 10.3 | 20.8 | 9.1 | 0.0 | 0.1 | 3.4 | 13.6 | 21.8 | | FGFA | 1x | R50 | 7.5 | 14.5 | 7.1 | 0.0 | 0.2 | 1.5 | 9.6 | 17.0 | | FGFA | 1x | x101 | 13.6 | 29.2 | 10.5 | 0.0 | 0.9 | 6.3 | 17.8 | 28.5 | | SELSA | 1x | R50 | 6.7 | 12.7 | 6.4 | 0.0 | 0.2 | 1.2 | 8.6 | 15.0 | | SELSA | 1x | x101 | 11.8 | 23.0 | 11.1 | 0.0 | 0.5 | 2.7 | 14.3 | 30.2 | | TROI | 1x | R50 | 7.9 | 15.9 | 7.0 | 0.0 | 0.2 | 1.5 | 10.3 | 16.3 | | TROI | 1x | x101 | 12.0 | 23.9 | 10.4 | 0.0 | 0.1 | 4.8 | 16.6 | 24.7 | | TransVOD | 50e | R50 | 9.7 | 21.1 | 8.0 | 1.0 | 3.2 | 4.9 | 11.5 | 23.8 | | StreamYOLO | 1x | YOLOX | 18.0 | 35.0 | 16.7 | 1.6 | 5.1 | 10.6 | 22.3 | 33.9 | | FCOS | 1x | R50 | 12.4 | 24.6 | 11.5 | 1.3 | 3.1 | 4.8 | 13.8 | 30.6 | | ATSS | 1x | R50 | 13.7 | 28.2 | 11.9 | 1.5 | 4.6 | 7.2 | 16.2 | 29.9 | | YOLOX-S | 50e | YOLOX | 7.8 | 17.0 | 6.4 | 1.6 | 3.5 | 5.6 | 10.4 | 12.8 | | DyHead | 1x | R50 | 9.3 | 19.3 | 8.0 | 1.4 | 3.5 | 5.0 | 10.7 | 20.7 | | RepPoints | 1x | R50 | 13.6 | 28.3 | 11.7 | 0.7 | 3.9 | 5.4 | 16.3 | 29.0 | | Deformable-DETR | 1x | R50 | 9.8 | 20.2 | 8.4 | 2.5 | 3.7 | 5.1 | 11.9 | 19.5 | | Sparse RCNN | 1x | R50 | 8.1 | 16.6 | 7.1 | 1.0 | 2.9 | 4.5 | 9.5 | 16.0 | | Cascade RPN | 1x | R50 | 12.5 | 25.0 | 11.3 | 0.9 | 3.9 | 6.2 | 15.1 | 25.3 | | CESCE | 15e | nan | 11.0 | 23.4 | 9.3 | 1.7 | 3.5 | 4.4 | 13.0 | 23.8 | | CFINet | 1x | R50 | 12.2 | 25.8 | 10.0 | 1.0 | 3.3 | 6.3 | 15.1 | 25.8 | | Yolov8-s | 2x | YOLOv8 | 13.2 | 26.1 | 12.1 | 3.9 | 5.0 | 10.1 | 16.1 | 22.9 | | Yolov8-L | 2x | YOLOv8 | 16.0 | 31.2 | 15.2 | 3.6 | 5.1 | 9.9 | 19.7 | 27.3 | | Yolov9-C | 2x | nan | 15.5 | 30.3 | 14.3 | 1.8 | 5.8 | 9.8 | 19.1 | 33.4 | | YOLOFT-S | 2x | YOLOv8 | 14.8 | 29.4 | 13.6 | 4.4 | 6.1 | 10.8 | 16.4 | 26.2 | | YOLOFT-L | 2x | YOLOv8 | 15.8 | 31.4 | 14.4 | 4.9 | 6.5 | 11.8 | 19.4 | 25.8 |
📚 Checkpoints
| Model | Params (M) | FLOPs (G) | Inference (ms) | Dataset | Checkpoint | |----------|------------|-----------|----------------|------------|------------| | YOLOFT-L | 45.16 | 230.14 | 36 | XS-VID | yoloft-L.pt| | YOLOFT-S | 53.58 | 13.02 | 16 | XS-VID | yoloft-S.pt |
:e-mail: Contact
If you have any problems about this repo or XS-VID benchmark, please be free to contact us at gjh_hust@hust.edu.cn 😉
Owner
- Login: gjhhust
- Kind: user
- Repositories: 11
- Profile: https://github.com/gjhhust
Citation (CITATION.cff)
cff-version: 1.2.0
preferred-citation:
type: software
message: If you use this software, please cite it as below.
authors:
- family-names: Jocher
given-names: Glenn
orcid: "https://orcid.org/0000-0001-5950-6979"
- family-names: Chaurasia
given-names: Ayush
orcid: "https://orcid.org/0000-0002-7603-6750"
- family-names: Qiu
given-names: Jing
orcid: "https://orcid.org/0000-0003-3783-7069"
title: "YOLO by Ultralytics"
version: 8.0.0
# doi: 10.5281/zenodo.3908559 # TODO
date-released: 2023-1-10
license: AGPL-3.0
url: "https://github.com/ultralytics/ultralytics"
GitHub Events
Total
- Issues event: 10
- Watch event: 8
- Issue comment event: 25
- Push event: 14
- Fork event: 1
Last Year
- Issues event: 10
- Watch event: 8
- Issue comment event: 25
- Push event: 14
- Fork event: 1
Dependencies
- matplotlib >=3.2.2
- opencv-python >=4.6.0
- pandas >=1.1.4
- pillow >=7.1.2
- psutil *
- py-cpuinfo *
- pyyaml >=5.3.1
- requests >=2.23.0
- scipy >=1.4.1
- seaborn >=0.11.0
- torch >=1.7.0
- torchvision >=0.8.1
- tqdm >=4.64.0