Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.9%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: bind-TIAN
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 7.98 MB
Statistics
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
Unofficial PyTorch Implementation of Exploring Plain Vision Transformer Backbones for Object Detection
Results | Updates | Usage | Todo | Acknowledge
This branch contains the unofficial pytorch implementation of Exploring Plain Vision Transformer Backbones for Object Detection. Thanks for their wonderful work!
Results from this repo on COCO
The models are trained on 4 A100 machines with 2 images per gpu, which makes a batch size of 64 during training.
| Model | Pretrain | Machine | FrameWork | Box mAP | Mask mAP | config | log | weight | | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | | ViT-Base | IN1K+MAE | TPU | Mask RCNN | 51.1 | 45.5 | config | log | OneDrive | | ViT-Base | IN1K+MAE | GPU | Mask RCNN | 51.1 | 45.4 | config | log | OneDrive | | ViTAE-Base | IN1K+MAE | GPU | Mask RCNN | 51.6 | 45.8 | config | log | OneDrive | | ViTAE-Small | IN1K+Sup | GPU | Mask RCNN | 45.6 | 40.1 | config | log | OneDrive |
Updates
[2022-04-18] Explore using small 1K supervised trained models (20M parameters) for ViTDet (45.6 mAP). The results with multi-stage structure is 46.0 mAP for Swin-T and 47.8 mAP for ViTAEv2-S with Mask RCNN on COCO.
[2022-04-17] Release the pretrained weights and logs for ViT-B and ViTAE-B on MS COCO. The models are totally trained with PyTorch on GPU.
[2022-04-16] Release the initial unofficial implementation of ViTDet with ViT-Base model! It obtains 51.1 mAP and 45.5 mAP on detection and segmentation, respectively. The weights and logs will be uploaded soon.
Applications of ViTAE Transformer include: image classification | object detection | semantic segmentation | animal pose segmentation | remote sensing | matting
Usage
We use PyTorch 1.9.0 or NGC docker 21.06, and mmcv 1.3.9 for the experiments.
bash
git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
git checkout v1.3.9
MMCV_WITH_OPS=1 pip install -e .
cd ..
git clone https://github.com/ViTAE-Transformer/ViTDet.git
cd ViTDet
pip install -v -e .
After install the two repos, install timm and einops, i.e.,
bash
pip install timm==0.4.9 einops
Download the pretrained models from MAE or ViTAE, and then conduct the experiments by
```bash
for single machine
bash tools/dist_train.sh
for multiple machines
python -m torch.distributed.launch --nnodes
Todo
This repo current contains modifications including:
- using LN for the convolutions in RPN and heads
- using large scale jittor for augmentation
- using RPE from MViT
- using longer training epochs and 1024 test size
- using global attention layers
There are other things to do:
[ ] Implement the conv blocks for global information communication
[ ] Tune the models for Cascade RCNN
[ ] Train ViT models for the LVIS dataset
[ ] Train ViTAE model with the ViTDet framework
Acknowledge
We acknowledge the excellent implementation from mmdetection, MAE, MViT, and BeiT.
Citing ViTDet
@article{Li2022ExploringPV,
title={Exploring Plain Vision Transformer Backbones for Object Detection},
author={Yanghao Li and Hanzi Mao and Ross B. Girshick and Kaiming He},
journal={ArXiv},
year={2022},
volume={abs/2203.16527}
}
For ViTAE and ViTAEv2, please refer to: ``` @article{xu2021vitae, title={Vitae: Vision transformer advanced by exploring intrinsic inductive bias}, author={Xu, Yufei and Zhang, Qiming and Zhang, Jing and Tao, Dacheng}, journal={Advances in Neural Information Processing Systems}, volume={34}, year={2021} }
@article{zhang2022vitaev2, title={ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond}, author={Zhang, Qiming and Xu, Yufei and Zhang, Jing and Tao, Dacheng}, journal={arXiv preprint arXiv:2202.10108}, year={2022} } ```
Owner
- Login: bind-TIAN
- Kind: user
- Repositories: 13
- Profile: https://github.com/bind-TIAN
GitHub Events
Total
- Issues event: 1
Last Year
- Issues event: 1
Dependencies
- cython *
- numpy *
- docutils ==0.16.0
- recommonmark *
- sphinx ==4.0.2
- sphinx-copybutton *
- sphinx_markdown_tables *
- sphinx_rtd_theme ==0.5.2
- mmcv-full >=1.3.8
- cityscapesscripts *
- imagecorruptions *
- scipy *
- sklearn *
- mmcv *
- torch *
- torchvision *
- matplotlib *
- numpy *
- pycocotools *
- pycocotools-windows *
- six *
- terminaltables *
- asynctest * test
- codecov * test
- flake8 * test
- interrogate * test
- isort ==4.3.21 test
- kwarray * test
- mmtrack * test
- onnx ==1.7.0 test
- onnxruntime >=1.8.0 test
- pytest * test
- ubelt * test
- xdoctest >=0.10.0 test
- yapf * test