https://github.com/alphonsg/swin-transformer-object-detection

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

https://github.com/alphonsg/swin-transformer-object-detection

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.0%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Basic Info
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of SwinTransformer/Swin-Transformer-Object-Detection
Created almost 5 years ago · Last pushed about 4 years ago
Metadata Files
Readme Contributing License Code of conduct

README.md

Swin Transformer for Object Detection

This repo contains the supported code and configuration files to reproduce object detection results of Swin Transformer. It is based on mmdetection.

Updates

05/11/2021 Models for MoBY are released

04/12/2021 Initial commits

Results and Models

Mask R-CNN

| Backbone | Pretrain | Lr Schd | box mAP | mask mAP | #params | FLOPs | config | log | model | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |:---: | | Swin-T | ImageNet-1K | 1x | 43.7 | 39.8 | 48M | 267G | config | github/baidu | github/baidu | | Swin-T | ImageNet-1K | 3x | 46.0 | 41.6 | 48M | 267G | config | github/baidu | github/baidu | | Swin-S | ImageNet-1K | 3x | 48.5 | 43.3 | 69M | 359G | config | github/baidu | github/baidu |

Cascade Mask R-CNN

| Backbone | Pretrain | Lr Schd | box mAP | mask mAP | #params | FLOPs | config | log | model | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |:---: | | Swin-T | ImageNet-1K | 1x | 48.1 | 41.7 | 86M | 745G | config | github/baidu | github/baidu | | Swin-T | ImageNet-1K | 3x | 50.4 | 43.7 | 86M | 745G | config | github/baidu | github/baidu | | Swin-S | ImageNet-1K | 3x | 51.9 | 45.0 | 107M | 838G | config | github/baidu | github/baidu | | Swin-B | ImageNet-1K | 3x | 51.9 | 45.0 | 145M | 982G | config | github/baidu | github/baidu |

RepPoints V2

| Backbone | Pretrain | Lr Schd | box mAP | mask mAP | #params | FLOPs | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | Swin-T | ImageNet-1K | 3x | 50.0 | - | 45M | 283G |

Mask RepPoints V2

| Backbone | Pretrain | Lr Schd | box mAP | mask mAP | #params | FLOPs | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | Swin-T | ImageNet-1K | 3x | 50.3 | 43.6 | 47M | 292G |

Notes:

Results of MoBY with Swin Transformer

Mask R-CNN

| Backbone | Pretrain | Lr Schd | box mAP | mask mAP | #params | FLOPs | config | log | model | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |:---: | | Swin-T | ImageNet-1K | 1x | 43.6 | 39.6 | 48M | 267G | config | github/baidu | github/baidu | | Swin-T | ImageNet-1K | 3x | 46.0 | 41.7 | 48M | 267G | config | github/baidu | github/baidu |

Cascade Mask R-CNN

| Backbone | Pretrain | Lr Schd | box mAP | mask mAP | #params | FLOPs | config | log | model | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |:---: | | Swin-T | ImageNet-1K | 1x | 48.1 | 41.5 | 86M | 745G | config | github/baidu | github/baidu | | Swin-T | ImageNet-1K | 3x | 50.2 | 43.5 | 86M | 745G | config | github/baidu | github/baidu |

Notes:

  • The drop path rate needs to be tuned for best practice.
  • MoBY pre-trained models can be downloaded from MoBY with Swin Transformer.

Usage

Installation

Please refer to get_started.md for installation and dataset preparation.

Inference

```

single-gpu testing

python tools/test.py --eval bbox segm

multi-gpu testing

tools/disttest.sh <CONFIGFILE> --eval bbox segm ```

Training

To train a detector with pre-trained models, run: ```

single-gpu training

python tools/train.py --cfg-options model.pretrained= [model.backbone.use_checkpoint=True] [other optional arguments]

multi-gpu training

tools/disttrain.sh <CONFIGFILE> --cfg-options model.pretrained= [model.backbone.usecheckpoint=True] [other optional arguments] For example, to train a Cascade Mask R-CNN model with a `Swin-T` backbone and 8 gpus, run: tools/disttrain.sh configs/swin/cascademaskrcnnswintinypatch4window7mstrain480-800giou4conv1fadamw3xcoco.py 8 --cfg-options model.pretrained=<PRETRAINMODEL> ```

Note: use_checkpoint is used to save GPU memory. Please refer to this page for more details.

Apex (optional):

We use apex for mixed precision training by default. To install apex, run: git clone https://github.com/NVIDIA/apex cd apex pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ If you would like to disable apex, modify the type of runner as EpochBasedRunner and comment out the following code block in the configuration files: ```

do not use mmdet version fp16

fp16 = None optimizerconfig = dict( type="DistOptimizerHook", updateinterval=1, gradclip=None, coalesce=True, bucketsizemb=-1, usefp16=True, ) ```

Citing Swin Transformer

@article{liu2021Swin, title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows}, author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining}, journal={arXiv preprint arXiv:2103.14030}, year={2021} }

Other Links

Image Classification: See Swin Transformer for Image Classification.

Semantic Segmentation: See Swin Transformer for Semantic Segmentation.

Self-Supervised Learning: See MoBY with Swin Transformer.

Video Recognition, See Video Swin Transformer.

Owner

  • Login: AlphonsG
  • Kind: user

GitHub Events

Total
Last Year

Dependencies

.github/workflows/build.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • codecov/codecov-action v1.0.10 composite
.github/workflows/build_pat.yml actions
  • actions/checkout v2 composite
.github/workflows/deploy.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
docker/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
docker/serve/Dockerfile docker
  • ${BASE_IMAGE} latest build
requirements/build.txt pypi
  • cython *
  • numpy *
requirements/docs.txt pypi
  • recommonmark *
  • sphinx *
  • sphinx_markdown_tables *
  • sphinx_rtd_theme *
requirements/optional.txt pypi
  • albumentations >=0.3.2
  • cityscapesscripts *
  • imagecorruptions *
  • mmlvis *
  • scipy *
  • sklearn *
requirements/readthedocs.txt pypi
  • mmcv *
  • torch *
  • torchvision *
requirements/runtime.txt pypi
  • matplotlib *
  • mmpycocotools *
  • numpy *
  • six *
  • terminaltables *
  • timm *
requirements/tests.txt pypi
  • asynctest * test
  • codecov * test
  • flake8 * test
  • interrogate * test
  • isort ==4.3.21 test
  • kwarray * test
  • onnx ==1.7.0 test
  • onnxruntime ==1.5.1 test
  • pytest * test
  • ubelt * test
  • xdoctest >=0.10.0 test
  • yapf * test
requirements.txt pypi
setup.py pypi