vitdet

Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"

https://github.com/vitae-transformer/vitdet

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.1%) to scientific vocabulary

Keywords

deep-learning object-detection pytorch vision-transformer

Last synced: 6 months ago · JSON representation ·

Repository

Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"

Basic Info

Host: GitHub
Owner: ViTAE-Transformer
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 8.29 MB

Statistics

Stars: 561
Watchers: 4
Forks: 45
Open Issues: 17
Releases: 0

Topics

deep-learning object-detection pytorch vision-transformer

Created almost 4 years ago · Last pushed almost 4 years ago

Metadata Files

Readme License Citation

Unofficial PyTorch Implementation of Exploring Plain Vision Transformer Backbones for Object Detection

Results | Updates | Usage | Todo | Acknowledge

This branch contains the unofficial pytorch implementation of Exploring Plain Vision Transformer Backbones for Object Detection. Thanks for their wonderful work!

Results from this repo on COCO

The models are trained on 4 A100 machines with 2 images per gpu, which makes a batch size of 64 during training.

| Model | Pretrain | Machine | FrameWork | Box mAP | Mask mAP | config | log | weight | | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | | ViT-Base | IN1K+MAE | TPU | Mask RCNN | 51.1 | 45.5 | config | log | OneDrive | | ViT-Base | IN1K+MAE | GPU | Mask RCNN | 51.1 | 45.4 | config | log | OneDrive | | ViTAE-Base | IN1K+MAE | GPU | Mask RCNN | 51.6 | 45.8 | config | log | OneDrive | | ViTAE-Small | IN1K+Sup | GPU | Mask RCNN | 45.6 | 40.1 | config | log | OneDrive |

Updates

[2022-04-18] Explore using small 1K supervised trained models (20M parameters) for ViTDet (45.6 mAP). The results with multi-stage structure is 46.0 mAP for Swin-T and 47.8 mAP for ViTAEv2-S with Mask RCNN on COCO.

[2022-04-17] Release the pretrained weights and logs for ViT-B and ViTAE-B on MS COCO. The models are totally trained with PyTorch on GPU.

[2022-04-16] Release the initial unofficial implementation of ViTDet with ViT-Base model! It obtains 51.1 mAP and 45.5 mAP on detection and segmentation, respectively. The weights and logs will be uploaded soon.

Applications of ViTAE Transformer include: image classification | object detection | semantic segmentation | animal pose segmentation | remote sensing | matting

Usage

We use PyTorch 1.9.0 or NGC docker 21.06, and mmcv 1.3.9 for the experiments. bash git clone https://github.com/open-mmlab/mmcv.git cd mmcv git checkout v1.3.9 MMCV_WITH_OPS=1 pip install -e . cd .. git clone https://github.com/ViTAE-Transformer/ViTDet.git cd ViTDet pip install -v -e .

After install the two repos, install timm and einops, i.e., bash pip install timm==0.4.9 einops

Download the pretrained models from MAE or ViTAE, and then conduct the experiments by

```bash

for single machine

bash tools/dist_train.sh --cfg-options model.pretrained=

for multiple machines

python -m torch.distributed.launch --nnodes --noderank --nprocpernode --masteraddr --master_port tools/train.py --cfg-options model.pretrained= --launcher pytorch ```

Todo

This repo current contains modifications including:

using LN for the convolutions in RPN and heads
using large scale jittor for augmentation
using RPE from MViT
using longer training epochs and 1024 test size
using global attention layers

There are other things to do:

[ ] Implement the conv blocks for global information communication
[ ] Tune the models for Cascade RCNN
[ ] Train ViT models for the LVIS dataset
[ ] Train ViTAE model with the ViTDet framework

Acknowledge

We acknowledge the excellent implementation from mmdetection, MAE, MViT, and BeiT.

Citing ViTDet

@article{Li2022ExploringPV, title={Exploring Plain Vision Transformer Backbones for Object Detection}, author={Yanghao Li and Hanzi Mao and Ross B. Girshick and Kaiming He}, journal={ArXiv}, year={2022}, volume={abs/2203.16527} }

For ViTAE and ViTAEv2, please refer to: ``` @article{xu2021vitae, title={Vitae: Vision transformer advanced by exploring intrinsic inductive bias}, author={Xu, Yufei and Zhang, Qiming and Zhang, Jing and Tao, Dacheng}, journal={Advances in Neural Information Processing Systems}, volume={34}, year={2021} }

@article{zhang2022vitaev2, title={ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond}, author={Zhang, Qiming and Xu, Yufei and Zhang, Jing and Tao, Dacheng}, journal={arXiv preprint arXiv:2202.10108}, year={2022} } ```

Owner

Name: ViTAE-Transformer
Login: ViTAE-Transformer
Kind: organization

Repositories: 10
Profile: https://github.com/ViTAE-Transformer

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMDetection Contributors"
title: "OpenMMLab Detection Toolbox and Benchmark"
date-released: 2018-08-22
url: "https://github.com/open-mmlab/mmdetection"
license: Apache-2.0

GitHub Events

Total

Issues event: 1
Watch event: 47
Issue comment event: 1
Fork event: 1

Last Year

Issues event: 1
Watch event: 47
Issue comment event: 1
Fork event: 1

Committers

Last synced: 9 months ago

All Time

Total Commits: 16
Total Committers: 1
Avg Commits per committer: 16.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Annbless	a**s@g**m	16

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 51
Total pull requests: 0
Average time to close issues: 2 days
Average time to close pull requests: N/A
Total issue authors: 22
Total pull request authors: 0
Average comments per issue: 1.94
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

foolhard (4)
yitianlian (2)
fadaishaitaiyang (2)
austinmw (1)
larenzhang (1)
junchen14 (1)
mxy0610 (1)
HerrYu123 (1)
Yuxin-CV (1)
wudizuixiaosa (1)
vansin (1)
youngwanLEE (1)
liming-ai (1)
zhangletian2 (1)
YIFanH (1)

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

requirements/build.txt pypi

cython *
numpy *

requirements/docs.txt pypi

docutils ==0.16.0
recommonmark *
sphinx ==4.0.2
sphinx-copybutton *
sphinx_markdown_tables *
sphinx_rtd_theme ==0.5.2

requirements/mminstall.txt pypi

mmcv-full >=1.3.8

requirements/optional.txt pypi

cityscapesscripts *
imagecorruptions *
scipy *
sklearn *

requirements/readthedocs.txt pypi

mmcv *
torch *
torchvision *

requirements/runtime.txt pypi

matplotlib *
numpy *
pycocotools *
pycocotools-windows *
six *
terminaltables *

requirements/tests.txt pypi

asynctest * test
codecov * test
flake8 * test
interrogate * test
isort ==4.3.21 test
kwarray * test
mmtrack * test
onnx ==1.7.0 test
onnxruntime >=1.8.0 test
pytest * test
ubelt * test
xdoctest >=0.10.0 test
yapf * test

vitdet

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Unofficial PyTorch Implementation of Exploring Plain Vision Transformer Backbones for Object Detection

Results from this repo on COCO

Updates

Usage

for single machine

for multiple machines

Todo

Acknowledge

Citing ViTDet

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies