sparsevit

[CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

https://github.com/mit-han-lab/sparsevit

Science Score: 72.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    18 of 394 committers (4.6%) from academic institutions
  • Institutional organization owner
    Organization mit-han-lab has institutional domain (hanlab.mit.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.7%) to scientific vocabulary

Keywords from Contributors

self-supervised-learning multimodal vision-transformer swin-transformer resnet pretrained-models beit clip constrastive-learning convnext
Last synced: 6 months ago · JSON representation ·

Repository

[CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

Basic Info
  • Host: GitHub
  • Owner: mit-han-lab
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 48.5 MB
Statistics
  • Stars: 68
  • Watchers: 3
  • Forks: 6
  • Open Issues: 2
  • Releases: 0
Created over 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

website | paper

Alt text

Abstract

High-resolution images enable neural networks to learn richer visual representations. However, this improved performance comes at the cost of growing computational complexity, hindering their usage in latency-sensitive applications. As not all pixels are equal, skipping computations for less-important regions offers a simple and effective measure to reduce the computation. This, however, is hard to be translated into actual speedup for CNNs since it breaks the regularity of the dense convolution workload. In this paper, we introduce SparseViT that revisits activation sparsity for recent window-based vision transformers (ViTs). As window attentions are naturally batched over blocks, actual speedup with window activation pruning becomes possible: i.e., ∼50% latency reduction with 60% sparsity. Different layers should be assigned with different pruning ratios due to their diverse sensitivities and computational costs. We introduce sparsity-aware adaptation and apply the evolutionary search to efficiently find the optimal layerwise sparsity configuration within the vast search space. SparseViT achieves speedups of 1.5×, 1.4×, and 1.3× compared to its dense counterpart in monocular 3D object detection, 2D instance segmentation, and 2D semantic segmentation, respectively, with negligible to no loss of accuracy.

Prerequisite

Our code is based on mmdetection 2.28.2 1. mmcv >= 1.3.17, and <= 1.8.0 2. torchpack 3. OpenMPI = 4.0.4 and mpi4py = 3.0.3 (Needed for torchpack)

Training Pipeline

Swin Pre-train

bash tools/dist_train.sh configs/swin/mask_rcnn_swin-t-p4-w7_fpn_fp16_ms-crop-3x_coco.py 8

or download the checkpoint from model in https://github.com/open-mmlab/mmdetection/tree/main/configs/swin.

Sparsity-Aware Adaption

In Sparsity-Aware Adaption, we randomly sample layerwise sparsity at each iteration.

bash tools/dist_train.sh configs/sparsevit/mask_rcnn_sparsevit_saa.py 8

Latency-Constrained Evolutionary Search

We use evolutionary search based on Sparsity-Aware Adaption model to find the optimal sparsity configuration whose lantency is between max_latency and min_latency.

torchpack dist-run -np 8 python tools/search.py configs/sparsevit/mask_rcnn_sparsevit.py [checkpoint_path] --max [max_latency] --min [min_latency]

For example, the dense model with 672x672 input resolution has about 47.8ms latency. We want to find the optimal sparsity configuration with latency under 42ms.

torchpack dist-run -np 8 python tools/search.py configs/sparsevit/mask_rcnn_sparsevit.py mask_rcnn_sparsevit_saa.pth --max 42 --min 37

The search log will be stored in work_dirs/search_max42_min37.txt.

Finetune

Finetune the SAA model with optimal sparsity configuration.

For example, the best sparsity configuration under 42ms is

backbone.stages.0 : 0.3
backbone.stages.1 : 0
backbone.stages.21 : 0.1
backbone.stages.2
2 : 0.2
backbone.stages.2_3 : 0.2
backbone.stages.3 : 0

bash tools/dist_train.sh configs/sparsevit/sparsevit_cfg1_42ms.py 8

Latency Measure

We measure our model's latency using input image batch of 4.

python tools/measure_latency.py [config] --img_size [img_size]

For example, python tools/measure_latency.py configs/sparsevit/sparsevit_cfg1_42ms.py --img_size 672

Results

We report our latency on NVIDIA RTX A6000 GPU.

The pre-trained SAA(Sparsity-Aware Adaption) model is here.

| sparsity configuration | resolution | latency | bbox mAP | mask mAP | model | | ------------- | ---------- | -------- | -------- | -------- | ------| | -- | 672x672 | 47.8 | 42.6 | 38.8 | | |config | 672x672 | 41.3 | 42.4 | 38.5 | link | |config | 672x672 | 34.2 | 41.6 | 37.7 | link | |config | 672x672 | 32.9 | 41.3 | 37.4 | link |

Owner

  • Name: MIT HAN Lab
  • Login: mit-han-lab
  • Kind: organization
  • Location: MIT

Accelerating Deep Learning Computing

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMDetection Contributors"
title: "OpenMMLab Detection Toolbox and Benchmark"
date-released: 2018-08-22
url: "https://github.com/open-mmlab/mmdetection"
license: Apache-2.0

GitHub Events

Total
  • Watch event: 12
  • Fork event: 1
Last Year
  • Watch event: 12
  • Fork event: 1

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 2,042
  • Total Committers: 394
  • Avg Commits per committer: 5.183
  • Development Distribution Score (DDS): 0.848
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Kai Chen c****v@g****m 311
Wenwei Zhang 4****e 184
Cao Yuhang y****6@g****m 156
Haian Huang(深度眸) 1****9@q****m 135
Jerry Jiarui XU x****6@g****m 109
RangiLyu l****i@g****m 70
Shilong Zhang 6****g 56
pangjm p****u@g****m 49
BigDong y****g@t****n 48
Guangchen Lin 3****0@q****m 40
Cedric Luo l****6@o****m 37
ThangVu t****k@g****m 32
Jiaqi Wang 1****0@l****k 30
Wang Xinjiang w****g@s****m 29
Czm369 4****9 22
Qiaofei Li q****i@g****m 22
Yosuke Shinya 4****y 21
jbwang1997 j****7@g****m 21
Jon Crall e****c@g****m 21
xyaochen i****n@g****m 18
RunningLeon m****g@s****m 15
tianyuandu t****u@g****m 13
Yue Zhou 5****9@q****m 12
David de la Iglesia Castro d****o@g****m 11
Ryan Li x****e@c****k 10
Maxim Bonnaerens m****m@b****e 10
Kamran Melikov m****k@g****m 9
yuzhj 3****j 8
simon wu w****y@s****m 8
lizz i****e 7
and 364 more...

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 4
  • Total pull requests: 0
  • Average time to close issues: 28 days
  • Average time to close pull requests: N/A
  • Total issue authors: 4
  • Total pull request authors: 0
  • Average comments per issue: 0.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: about 2 months
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 0
  • Average comments per issue: 0.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Yunge6666 (1)
  • Shrinidhibhat87 (1)
  • Benbie (1)
  • kaikai23 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels