https://github.com/bestsongc/yolov7_qat

https://github.com/bestsongc/yolov7_qat

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: Bestsongc
  • License: gpl-3.0
  • Default Branch: main
  • Size: 104 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of yhwang-hub/yolov7_QAT
Created over 2 years ago · Last pushed over 2 years ago

https://github.com/Bestsongc/yolov7_QAT/blob/main/

# Description
![Language](https://img.shields.io/badge/language-c++-brightgreen)
![Language](https://img.shields.io/badge/CUDA-11.1-brightgreen) 
![Language](https://img.shields.io/badge/TensorRT-8.5.1.7-brightgreen)
![Language](https://img.shields.io/badge/OpenCV-4.5.5-brightgreen) 
![Language](https://img.shields.io/badge/ubuntu-16.04-brightorigin)

This is a repository for QAT finetune on yolov7 using [TensorRT's pytorch_quantization tool](https://github.com/NVIDIA/TensorRT/tree/main/tools/pytorch-quantization)
|  Method   | Calibration method  | mAPval
0.5|mAPval
0.5:0.95 |batch-1 fps
Jetson Orin-X |batch-16 fps
Jetson Orin-X |weight| | ---- | ---- |---- |---- |----|----|-| | pytorch FP16 | - | 0.6972 | 0.5120 |-|-|[yolov7.pt](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt)| | pytorch PTQ-INT8 | Histogram(MSE) | 0.6957 | 0.5100 |-|-|[yolov7_ptq.pt](https://drive.google.com/file/d/1AMymKjKMDmhuNSI3jzL6dv_Pc3rdDDj1/view?usp=sharing) [yolov7_ptq_640.onnx](https://drive.google.com/file/d/1kvCV8PxV6RCidehN4Wp78M116oZ_mSTX/view?usp=sharing)| | pytorch QAT-INT8 | Histogram(MSE) | 0.6961 | 0.5111 |-|-|[yolov7_qat.pt](https://drive.google.com/file/d/16Ylot5AfkjKeCyVlX3ECsuT6VmHULkd-/view?usp=sharing)| | TensorRT FP16| - | 0.6973 | 0.5124 |140 |168|[yolov7.onnx](https://drive.google.com/file/d/1R5muSJWVC_BQKml4s4wQQewUXdmQl0Mm/view?usp=sharing) | | TensorRT PTQ-INT8 | TensorRT built in EntropyCalibratorV2 | 0.6317 | 0.4573 |207|264|-| | TensorRT QAT-INT8 | Histogram(MSE) | 0.6962 | 0.5113 |207|266|[yolov7_qat_640.onnx](https://drive.google.com/file/d/1qn-p4N3GZojIOvvxkzmPGCQKR6q4ov73/view?usp=sharing)| - The above table comes from [https://github.com/NVIDIA-AI-IOT/yolo_deepstream/tree/main/yolov7_qat](https://github.com/NVIDIA-AI-IOT/yolo_deepstream/blob/main/yolov7_qat/README.md) - network input resolution: 3x640x640 - note: trtexec cudaGraph is enabled # How To QAT Training ## 1.Setup Suggest to use docker environment. Download docker image ``` docker pull longxiaowyh/yolov7:v1 ``` Create docker container ``` nvidia-docker run -itu root:root --name yolov7 --gpus all -v /your_path:/target_path -v /tmp/.X11-unix/:/tmp/.X11-unix/ -e DISPLAY=unix$DISPLAY -e GDK_SCALE -e GDK_DPI_SCALE -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility --shm-size=64g yolov7:v1 /bin/bash ``` 1.Clone and apply patch ``` git clone git@github.com:yhwang-hub/yolov7_quantization.git ``` 2.Install dependencies ``` pip install pytorch-quantization --extra-index-url https://pypi.ngc.nvidia.com ``` 3.Prepare coco dataset ``` . annotations captions_train2017.json captions_val2017.json instances_train2017.json instances_val2017.json person_keypoints_train2017.json person_keypoints_val2017.json coco -> coco coco128 images labels LICENSE README.txt images train2017 val2017 labels train2017 train2017.cache val2017 train2017.cache train2017.txt val2017.cache val2017.txt ``` ## 2.Start PTQ ### 2.1 Start sensitive layer analysis ``` python ptq.py --weights ./weights/yolov7s.pt --cocodir /home/wyh/disk/coco/ --batch_size 5 --save_ptq True --eval_origin --eval_ptq --start_ptq False --sensitive True ``` Modify the ignore_layers parameter in ptq.py as follows ``` parser.add_argument("--ignore_layers", type=str, default="model\.105\.m\.(.*)", help="regx") ``` ### 2.2 Start PTQ ``` python ptq.py --weights ./weights/yolov7s.pt --cocodir /home/wyh/disk/coco/ --batch_size 5 --save_ptq True --eval_origin --eval_ptq --start_ptq True --sensitive False ``` ## 3.Start QAT Training ``` python qat.py --weights ./weights/yolov5s.pt --cocodir /home/wyh/disk/coco/ --batch_size 5 --save_ptq True --save_qat True --eval_origin --eval_ptq --eval_qat ``` This script includes steps below: - Insert Q&DQ nodes to get fake-quant pytorch model [Pytorch quntization tool ](https://github.com/NVIDIA/TensorRT/tree/main/tools/pytorch-quantization)provides automatic insertion of QDQ function. But for yolov7 model, it can not get the same performance as PTQ, because in Explicit mode(QAT mode), TensorRT will henceforth refer Q/DQ nodes' placement to restrict the precision of the model. Some of the automatic added Q&DQ nodes can not be fused with other layers which will cause some extra useless precision convertion. In our script, We find Some rules and restrictions for yolov7, QDQ nodes are automatically analyzed and configured in a rule-based manner, ensuring that they are optimal under TensorRT. Ensuring that all nodes are running INT8(confirmed with tool:[trt-engine-explorer](https://github.com/NVIDIA/TensorRT/tree/main/tools/experimental/trt-engine-explorer), see [scripts/draw-engine.py](https://github.com/NVIDIA-AI-IOT/yolo_deepstream/blob/main/yolov7_qat/scripts/draw-engine.py)). for details of this part, please refer quantization/rules.py, About the guidance of Q&DQ insert, please refer [Guidance_of_QAT_performance_optimization](https://github.com/NVIDIA-AI-IOT/yolo_deepstream/blob/main/yolov7_qat/doc/Guidance_of_QAT_performance_optimization.md) - PTQ calibration After inserting Q&DQ nodes, we recommend to run PTQ-Calibration first. Per experiments, Histogram(MSE) is the best PTQ calibration method for yolov7. Note: if you are satisfied with PTQ result, you could also skip QAT. - QAT training After QAT, need to finetune traning our model. after getting the accuracy we are satisfied, Saving the weights to files See the run_qat.log file for the running results, ptq.onnx and qat.onnx will be generated in this path. # Benchmark ``` # engine trtexec --onnx=./outdir-no-rule/ptq.onnx --fp16 --int8 --verbose --saveEngine=./outdir-no-rule/yolo_ptq.engine --workspace=1024000 --warmUp=500 --duration=10 --useCudaGraph --useSpinWait --noDataTransfers --exportLayerInfo=./outdir-no-rule/yolov7_ptq_layer.json --profilingVerbosity=detailed --exportProfile=./outdir-no-rule/yolov7_ptq_profile.json trtexec --onnx=./outdir-no-rule/qat.onnx --fp16 --int8 --verbose --saveEngine=./outdir-no-rule/yolo_qat.engine --workspace=1024000 --warmUp=500 --duration=10 --useCudaGraph --useSpinWait --noDataTransfers --exportLayerInfo=./outdir-no-rule/yolov7_qat_layer.json --profilingVerbosity=detailed --exportProfile=./outdir-no-rule/yolov7_qat_profile.json # trtexec --loadEngine=./outdir-no-rule/yolo_ptq.engine --batch=1 trtexec --loadEngine=./outdir-no-rule/yolo_qat.engine --batch=1 # engine python scripts/draw-engine.py --layer=./outdir-no-rule/yolov7_ptq_layer.json --profile=./outdir-no-rule/yolov7_ptq_profile.json python scripts/draw-engine.py --layer=./outdir-no-rule/yolov7_qat_layer.json --profile=./outdir-no-rule/yolov7_qat_profile.json ``` RTX3060 qps test result as follow ![image](https://github.com/yhwang-hub/yolov7_quantization/blob/main/mAP.jpg) ![image](https://github.com/yhwang-hub/yolov7_quantization/blob/main/qps.png) # Official YOLOv7 Implementation of paper - [YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors](https://arxiv.org/abs/2207.02696) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/yolov7-trainable-bag-of-freebies-sets-new/real-time-object-detection-on-coco)](https://paperswithcode.com/sota/real-time-object-detection-on-coco?p=yolov7-trainable-bag-of-freebies-sets-new) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/yolov7) Open In Colab [![arxiv.org](http://img.shields.io/badge/cs.CV-arXiv%3A2207.02696-B31B1B.svg)](https://arxiv.org/abs/2207.02696) ## Web Demo - Integrated into [Huggingface Spaces ](https://huggingface.co/spaces/akhaliq/yolov7) using Gradio. Try out the Web Demo [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/yolov7) ## Performance MS COCO | Model | Test Size | APtest | AP50test | AP75test | batch 1 fps | batch 32 average time | | :-- | :-: | :-: | :-: | :-: | :-: | :-: | | [**YOLOv7**](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt) | 640 | **51.4%** | **69.7%** | **55.9%** | 161 *fps* | 2.8 *ms* | | [**YOLOv7-X**](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7x.pt) | 640 | **53.1%** | **71.2%** | **57.8%** | 114 *fps* | 4.3 *ms* | | | | | | | | | | [**YOLOv7-W6**](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-w6.pt) | 1280 | **54.9%** | **72.6%** | **60.1%** | 84 *fps* | 7.6 *ms* | | [**YOLOv7-E6**](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6.pt) | 1280 | **56.0%** | **73.5%** | **61.2%** | 56 *fps* | 12.3 *ms* | | [**YOLOv7-D6**](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-d6.pt) | 1280 | **56.6%** | **74.0%** | **61.8%** | 44 *fps* | 15.0 *ms* | | [**YOLOv7-E6E**](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6e.pt) | 1280 | **56.8%** | **74.4%** | **62.1%** | 36 *fps* | 18.7 *ms* | ## Installation Docker environment (recommended)
Expand ``` shell # create the docker container, you can change the share memory size if you have more. nvidia-docker run --name yolov7 -it -v your_coco_path/:/coco/ -v your_code_path/:/yolov7 --shm-size=64g nvcr.io/nvidia/pytorch:21.08-py3 # apt install required packages apt update apt install -y zip htop screen libgl1-mesa-glx # pip install required packages pip install seaborn thop # go to code folder cd /yolov7 ```
## Testing [`yolov7.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt) [`yolov7x.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7x.pt) [`yolov7-w6.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-w6.pt) [`yolov7-e6.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6.pt) [`yolov7-d6.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-d6.pt) [`yolov7-e6e.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6e.pt) ``` shell python test.py --data data/coco.yaml --img 640 --batch 32 --conf 0.001 --iou 0.65 --device 0 --weights yolov7.pt --name yolov7_640_val ``` You will get the results: ``` Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.51206 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.69730 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.55521 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.35247 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.55937 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.66693 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.38453 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.63765 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.68772 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.53766 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.73549 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.83868 ``` To measure accuracy, download [COCO-annotations for Pycocotools](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) to the `./coco/annotations/instances_val2017.json` ## Training Data preparation ``` shell bash scripts/get_coco.sh ``` * Download MS COCO dataset images ([train](http://images.cocodataset.org/zips/train2017.zip), [val](http://images.cocodataset.org/zips/val2017.zip), [test](http://images.cocodataset.org/zips/test2017.zip)) and [labels](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/coco2017labels-segments.zip). If you have previously used a different version of YOLO, we strongly recommend that you delete `train2017.cache` and `val2017.cache` files, and redownload [labels](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/coco2017labels-segments.zip) Single GPU training ``` shell # train p5 models python train.py --workers 8 --device 0 --batch-size 32 --data data/coco.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights '' --name yolov7 --hyp data/hyp.scratch.p5.yaml # train p6 models python train_aux.py --workers 8 --device 0 --batch-size 16 --data data/coco.yaml --img 1280 1280 --cfg cfg/training/yolov7-w6.yaml --weights '' --name yolov7-w6 --hyp data/hyp.scratch.p6.yaml ``` Multiple GPU training ``` shell # train p5 models python -m torch.distributed.launch --nproc_per_node 4 --master_port 9527 train.py --workers 8 --device 0,1,2,3 --sync-bn --batch-size 128 --data data/coco.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights '' --name yolov7 --hyp data/hyp.scratch.p5.yaml # train p6 models python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 train_aux.py --workers 8 --device 0,1,2,3,4,5,6,7 --sync-bn --batch-size 128 --data data/coco.yaml --img 1280 1280 --cfg cfg/training/yolov7-w6.yaml --weights '' --name yolov7-w6 --hyp data/hyp.scratch.p6.yaml ``` ## Transfer learning [`yolov7_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt) [`yolov7x_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7x_training.pt) [`yolov7-w6_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-w6_training.pt) [`yolov7-e6_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6_training.pt) [`yolov7-d6_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-d6_training.pt) [`yolov7-e6e_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6e_training.pt) Single GPU finetuning for custom dataset ``` shell # finetune p5 models python train.py --workers 8 --device 0 --batch-size 32 --data data/custom.yaml --img 640 640 --cfg cfg/training/yolov7-custom.yaml --weights 'yolov7_training.pt' --name yolov7-custom --hyp data/hyp.scratch.custom.yaml # finetune p6 models python train_aux.py --workers 8 --device 0 --batch-size 16 --data data/custom.yaml --img 1280 1280 --cfg cfg/training/yolov7-w6-custom.yaml --weights 'yolov7-w6_training.pt' --name yolov7-w6-custom --hyp data/hyp.scratch.custom.yaml ``` ## Re-parameterization See [reparameterization.ipynb](tools/reparameterization.ipynb) ## Inference On video: ``` shell python detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source yourvideo.mp4 ``` On image: ``` shell python detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source inference/images/horses.jpg ``` ## Export **Pytorch to CoreML (and inference on MacOS/iOS)** Open In Colab **Pytorch to ONNX with NMS (and inference)** Open In Colab ```shell python export.py --weights yolov7-tiny.pt --grid --end2end --simplify \ --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640 --max-wh 640 ``` **Pytorch to TensorRT with NMS (and inference)** Open In Colab ```shell wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-tiny.pt python export.py --weights ./yolov7-tiny.pt --grid --end2end --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640 git clone https://github.com/Linaom1214/tensorrt-python.git python ./tensorrt-python/export.py -o yolov7-tiny.onnx -e yolov7-tiny-nms.trt -p fp16 ``` **Pytorch to TensorRT another way** Open In Colab
Expand ```shell wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-tiny.pt python export.py --weights yolov7-tiny.pt --grid --include-nms git clone https://github.com/Linaom1214/tensorrt-python.git python ./tensorrt-python/export.py -o yolov7-tiny.onnx -e yolov7-tiny-nms.trt -p fp16 # Or use trtexec to convert ONNX to TensorRT engine /usr/src/tensorrt/bin/trtexec --onnx=yolov7-tiny.onnx --saveEngine=yolov7-tiny-nms.trt --fp16 ```
Tested with: Python 3.7.13, Pytorch 1.12.0+cu113 ## Pose estimation [`code`](https://github.com/WongKinYiu/yolov7/tree/pose) [`yolov7-w6-pose.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-w6-pose.pt) See [keypoint.ipynb](https://github.com/WongKinYiu/yolov7/blob/main/tools/keypoint.ipynb). ## Instance segmentation (with NTU) [`code`](https://github.com/WongKinYiu/yolov7/tree/mask) [`yolov7-mask.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-mask.pt) See [instance.ipynb](https://github.com/WongKinYiu/yolov7/blob/main/tools/instance.ipynb). ## Instance segmentation [`code`](https://github.com/WongKinYiu/yolov7/tree/u7/seg) [`yolov7-seg.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-seg.pt) YOLOv7 for instance segmentation (YOLOR + YOLOv5 + YOLACT) | Model | Test Size | APbox | AP50box | AP75box | APmask | AP50mask | AP75mask | | :-- | :-: | :-: | :-: | :-: | :-: | :-: | :-: | | **YOLOv7-seg** | 640 | **51.4%** | **69.4%** | **55.8%** | **41.5%** | **65.5%** | **43.7%** | ## Anchor free detection head [`code`](https://github.com/WongKinYiu/yolov7/tree/u6) [`yolov7-u6.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-u6.pt) YOLOv7 with decoupled TAL head (YOLOR + YOLOv5 + YOLOv6) | Model | Test Size | APval | AP50val | AP75val | | :-- | :-: | :-: | :-: | :-: | | **YOLOv7-u6** | 640 | **52.6%** | **69.7%** | **57.3%** | ## Citation ``` @article{wang2022yolov7, title={{YOLOv7}: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors}, author={Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark}, journal={arXiv preprint arXiv:2207.02696}, year={2022} } ``` ``` @article{wang2022designing, title={Designing Network Design Strategies Through Gradient Path Analysis}, author={Wang, Chien-Yao and Liao, Hong-Yuan Mark and Yeh, I-Hau}, journal={arXiv preprint arXiv:2211.04800}, year={2022} } ``` ## Teaser YOLOv7-semantic & YOLOv7-panoptic & YOLOv7-caption YOLOv7-semantic & YOLOv7-detection & YOLOv7-depth (with NTUT) YOLOv7-3d-detection & YOLOv7-lidar & YOLOv7-road (with NTUT) ## Acknowledgements
Expand * [https://github.com/AlexeyAB/darknet](https://github.com/AlexeyAB/darknet) * [https://github.com/WongKinYiu/yolor](https://github.com/WongKinYiu/yolor) * [https://github.com/WongKinYiu/PyTorch_YOLOv4](https://github.com/WongKinYiu/PyTorch_YOLOv4) * [https://github.com/WongKinYiu/ScaledYOLOv4](https://github.com/WongKinYiu/ScaledYOLOv4) * [https://github.com/Megvii-BaseDetection/YOLOX](https://github.com/Megvii-BaseDetection/YOLOX) * [https://github.com/ultralytics/yolov3](https://github.com/ultralytics/yolov3) * [https://github.com/ultralytics/yolov5](https://github.com/ultralytics/yolov5) * [https://github.com/DingXiaoH/RepVGG](https://github.com/DingXiaoH/RepVGG) * [https://github.com/JUGGHM/OREPA_CVPR2022](https://github.com/JUGGHM/OREPA_CVPR2022) * [https://github.com/TexasInstruments/edgeai-yolov5/tree/yolo-pose](https://github.com/TexasInstruments/edgeai-yolov5/tree/yolo-pose) * [https://github.com/NVIDIA-AI-IOT/yolo_deepstream/tree/main/yolov7_qat](https://github.com/NVIDIA-AI-IOT/yolo_deepstream/tree/main/yolov7_qat)

Owner

  • Name: Bestsongc
  • Login: Bestsongc
  • Kind: user

GitHub Events

Total
Last Year