sparsevit

[CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

https://github.com/mit-han-lab/sparsevit

Science Score: 72.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
18 of 394 committers (4.6%) from academic institutions
✓
Institutional organization owner
Organization mit-han-lab has institutional domain (hanlab.mit.edu)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.7%) to scientific vocabulary

Keywords from Contributors

self-supervised-learning multimodal vision-transformer swin-transformer resnet pretrained-models beit clip constrastive-learning convnext

Last synced: 9 months ago · JSON representation ·

Repository

[CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

Basic Info

Host: GitHub
Owner: mit-han-lab
License: apache-2.0
Language: Python
Default Branch: master
Homepage:
Size: 48.5 MB

Statistics

Stars: 68
Watchers: 3
Forks: 6
Open Issues: 2
Releases: 0

Created about 3 years ago · Last pushed about 2 years ago

Metadata Files

Readme Contributing License Code of conduct Citation

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

Abstract

High-resolution images enable neural networks to learn richer visual representations. However, this improved performance comes at the cost of growing computational complexity, hindering their usage in latency-sensitive applications. As not all pixels are equal, skipping computations for less-important regions offers a simple and effective measure to reduce the computation. This, however, is hard to be translated into actual speedup for CNNs since it breaks the regularity of the dense convolution workload. In this paper, we introduce SparseViT that revisits activation sparsity for recent window-based vision transformers (ViTs). As window attentions are naturally batched over blocks, actual speedup with window activation pruning becomes possible: i.e., ∼50% latency reduction with 60% sparsity. Different layers should be assigned with different pruning ratios due to their diverse sensitivities and computational costs. We introduce sparsity-aware adaptation and apply the evolutionary search to efficiently find the optimal layerwise sparsity configuration within the vast search space. SparseViT achieves speedups of 1.5×, 1.4×, and 1.3× compared to its dense counterpart in monocular 3D object detection, 2D instance segmentation, and 2D semantic segmentation, respectively, with negligible to no loss of accuracy.

Prerequisite

Our code is based on mmdetection 2.28.2 1. mmcv >= 1.3.17, and <= 1.8.0 2. torchpack 3. OpenMPI = 4.0.4 and mpi4py = 3.0.3 (Needed for torchpack)

Training Pipeline

Swin Pre-train

bash tools/dist_train.sh configs/swin/mask_rcnn_swin-t-p4-w7_fpn_fp16_ms-crop-3x_coco.py 8

or download the checkpoint from model in https://github.com/open-mmlab/mmdetection/tree/main/configs/swin.

Sparsity-Aware Adaption

In Sparsity-Aware Adaption, we randomly sample layerwise sparsity at each iteration.

bash tools/dist_train.sh configs/sparsevit/mask_rcnn_sparsevit_saa.py 8

Latency-Constrained Evolutionary Search

We use evolutionary search based on Sparsity-Aware Adaption model to find the optimal sparsity configuration whose lantency is between max_latency and min_latency.

torchpack dist-run -np 8 python tools/search.py configs/sparsevit/mask_rcnn_sparsevit.py [checkpoint_path] --max [max_latency] --min [min_latency]

For example, the dense model with 672x672 input resolution has about 47.8ms latency. We want to find the optimal sparsity configuration with latency under 42ms.

torchpack dist-run -np 8 python tools/search.py configs/sparsevit/mask_rcnn_sparsevit.py mask_rcnn_sparsevit_saa.pth --max 42 --min 37

The search log will be stored in work_dirs/search_max42_min37.txt.

Finetune

Finetune the SAA model with optimal sparsity configuration.

For example, the best sparsity configuration under 42ms is

backbone.stages.0 : 0.3
backbone.stages.1 : 0
backbone.stages.21 : 0.1
backbone.stages.22 : 0.2
backbone.stages.2_3 : 0.2
backbone.stages.3 : 0

bash tools/dist_train.sh configs/sparsevit/sparsevit_cfg1_42ms.py 8

Latency Measure

We measure our model's latency using input image batch of 4.

python tools/measure_latency.py [config] --img_size [img_size]

For example, python tools/measure_latency.py configs/sparsevit/sparsevit_cfg1_42ms.py --img_size 672

Results

We report our latency on NVIDIA RTX A6000 GPU.

The pre-trained SAA(Sparsity-Aware Adaption) model is here.

| sparsity configuration | resolution | latency | bbox mAP | mask mAP | model | | ------------- | ---------- | -------- | -------- | -------- | ------| | -- | 672x672 | 47.8 | 42.6 | 38.8 | | |config | 672x672 | 41.3 | 42.4 | 38.5 | link | |config | 672x672 | 34.2 | 41.6 | 37.7 | link | |config | 672x672 | 32.9 | 41.3 | 37.4 | link |

Owner

Name: MIT HAN Lab
Login: mit-han-lab
Kind: organization
Location: MIT

Website: https://hanlab.mit.edu
Repositories: 31
Profile: https://github.com/mit-han-lab

Accelerating Deep Learning Computing

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMDetection Contributors"
title: "OpenMMLab Detection Toolbox and Benchmark"
date-released: 2018-08-22
url: "https://github.com/open-mmlab/mmdetection"
license: Apache-2.0

GitHub Events

Total

Watch event: 12
Fork event: 1

Last Year

Watch event: 12
Fork event: 1

Committers

Last synced: about 1 year ago

All Time

Total Commits: 2,042
Total Committers: 394
Avg Commits per committer: 5.183
Development Distribution Score (DDS): 0.848

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Kai Chen	c**v@g**m	311
Wenwei Zhang	4****e	184
Cao Yuhang	y**6@g**m	156
Haian Huang(深度眸)	1**9@q**m	135
Jerry Jiarui XU	x**6@g**m	109
RangiLyu	l**i@g**m	70
Shilong Zhang	6****g	56
pangjm	p**u@g**m	49
BigDong	y**g@t**n	48
Guangchen Lin	3**0@q**m	40
Cedric Luo	l**6@o**m	37
ThangVu	t**k@g**m	32
Jiaqi Wang	1**0@l**k	30
Wang Xinjiang	w**g@s**m	29
Czm369	4****9	22
Qiaofei Li	q**i@g**m	22
Yosuke Shinya	4****y	21
jbwang1997	j**7@g**m	21
Jon Crall	e**c@g**m	21
xyaochen	i**n@g**m	18
RunningLeon	m**g@s**m	15
tianyuandu	t**u@g**m	13
Yue Zhou	5**9@q**m	12
David de la Iglesia Castro	d**o@g**m	11
Ryan Li	x**e@c**k	10
Maxim Bonnaerens	m**m@b**e	10
Kamran Melikov	m**k@g**m	9
yuzhj	3****j	8
simon wu	w**y@s**m	8
lizz	i****e	7
and 364 more...

Committer Domains (Top 20 + Academic)

qq.com: 38 163.com: 11 foxmail.com: 3 sensetime.com: 3 yandex.ru: 2 126.com: 2 glitech.com: 1 189.cn: 1 dsuess.me: 1 research.iiit.ac.in: 1 zyc.ai: 1 live.cn: 1 pku.edu.cn: 1 qiwen.name: 1 mail.ru: 1 outlook.com.tr: 1 pjlab.org.cn: 1 uni-tuebingen.de: 1 andrew.cmu.edu: 1 sina.com: 1 link.cuhk.edu.hk: 1 zju.edu.cn: 1 ucsd.edu: 1 berkeley.edu: 1 sjtu.edu.cn: 1 mails.tsinghua.edu.cn: 1 mail.ustc.edu.cn: 1 whu.edu.cn: 1 ucdavis.edu: 1 nyu.edu: 1 std.uestc.edu.cn: 1 tju.edu.cn: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 4
Total pull requests: 0
Average time to close issues: 28 days
Average time to close pull requests: N/A
Total issue authors: 4
Total pull request authors: 0
Average comments per issue: 0.5
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 0
Average time to close issues: about 2 months
Average time to close pull requests: N/A
Issue authors: 2
Pull request authors: 0
Average comments per issue: 0.5
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

sparsevit

Science Score: 72.0%

Keywords from Contributors

Repository

Basic Info

Statistics

Metadata Files

README.md

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

website | paper

Abstract

Prerequisite

Training Pipeline

Swin Pre-train

Sparsity-Aware Adaption

Latency-Constrained Evolutionary Search

Finetune

Latency Measure

Results

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels