https://github.com/christophreich1996/swin-transformer-v2

PyTorch reimplementation of the paper "Swin Transformer V2: Scaling Up Capacity and Resolution" [CVPR 2022].

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary

Keywords

attention computer-vision deep-learning pytorch swin-transformer transformer vision-transformer

Last synced: 9 months ago · JSON representation

Repository

PyTorch reimplementation of the paper "Swin Transformer V2: Scaling Up Capacity and Resolution" [CVPR 2022].

Basic Info

Host: GitHub
Owner: ChristophReich1996
License: mit
Language: Python
Default Branch: main
Homepage: https://arxiv.org/abs/2111.09883
Size: 53.7 KB

Statistics

Stars: 166
Watchers: 5
Forks: 14
Open Issues: 2
Releases: 0

Topics

attention computer-vision deep-learning pytorch swin-transformer transformer vision-transformer

Created over 4 years ago · Last pushed over 3 years ago

Metadata Files

Readme License

Swin Transformer V2: Scaling Up Capacity and Resolution

This implementation has been merged into the PyTorch Image Models library (Timm) with the nice help of Ross Wightman. Timm also offers pre-trained weights on ImageNet1k (see release).

Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution by Ze Liu, Han Hu et al. (Microsoft Research Asia).

This repository includes a pure PyTorch implementation of the Swin Transformer V2 and provides pre-trained weights (CIFAR10 & Places365).

The official Swin Transformer V1 implementation is available here. Currently (13.04.2022), an official implementation of the Swin Transformer V2 is not publicly available.

Update: The official Swin Transformer V2 implementation has been released here!

Installation

You can simply install the Swin Transformer V2 implementation as a Python package by using pip.

shell script pip install git+https://github.com/ChristophReich1996/Swin-Transformer-V2

Alternatively, you can clone the repository and use the implementation in swintransformerv2 directly in your project.

Usage

This implementation provides the configurations reported in the paper (SwinV2-T, SwinV2-S, etc.). You can build the model by calling the corresponding function. Please note that the Swin Transformer V2 (SwinTransformerV2 class) implementation returns the feature maps of each stage of the network (List[torch.Tensor]). If you want to use this implementation for image classification simply wrap this model and take the final feature map (a wrapper example can be found here).

```python from swintransformerv2 import SwinTransformerV2

from swintransformerv2 import swintransformerv2t, swintransformerv2s, swintransformerv2b, \ swintransformerv2l, swintransformerv2h, swintransformerv2g

SwinV2-T

swintransformer: SwinTransformerV2 = swintransformerv2t(inchannels=3, windowsize=8, inputresolution=(256, 256), sequentialselfattention=False, usecheckpoint=False) ```

If you want to change the resolution and/or the window size for fine-tuning or inference please use the update_resolution method.

```python

Change resolution and window size of the model

swintransformer.updateresolution(newwindowsize=16, newinputresolution=(512, 512)) ```

In case you want to use a custom configuration you can use the SwinTransformerV2 class. The constructor method takes the following parameters.

| Parameter | Description | Type | | ------------- | ------------- | ------------- | | inchannels | Number of input channels | int | | depth | Depth of the stage (number of layers) | int | | downscale | If true input is downsampled (see Fig. 3 or V1 paper) | bool | | inputresolution | Input resolution | Tuple[int, int] | | numberofheads | Number of attention heads to be utilized | int | | windowsize | Window size to be utilized | int | | shiftsize | Shifting size to be used | int | | fffeatureratio | Ratio of the hidden dimension in the FFN to the input channels | int | | dropout | Dropout in input mapping | float | | dropoutattention | Dropout rate of attention map | float | | dropoutpath | Dropout in main path | float | | usecheckpoint | If true checkpointing is utilized | bool | | sequentialselfattention | If true sequential self-attention is performed | bool | | usedeformable_block | If true deformable block is used | bool |

This file includes a full example how to use this implementation.

This implementation also includes a deformable version of the Swin Transformer V2 inspired by the paper Vision Transformer with Deformable Attention. Deformable attention can be utilized by setting use_deformable_block=True.

This repository also provides an image classification training script for CIFAR10 and Places365.

Results

| Model | Dataset | Accuracy | Weights | | ------------- | ------------- | ------------- | ------------- | | Swin Transformer V2 T | CIFAR10 | 0.8974 | backbone weights | | Swin Transformer V2 T deformable | CIFAR10 | 0.8962 | backbone weights | | Swin Transformer V2 B | Places365 (256 X 256) | 0.4456 (after 13 epochs) | backbone weights |

For details on how to load the checkpoints have a look at this issue.

Disclaimer

This is a very experimental implementation based on the Swin Transformer V2 paper and the official implementation of the Swin Transformer V1. Especially, the sequential self-attention implementation is currently not really memory efficient, if you have any idea for a more efficient sequential implementation please open a pull request. Since an official implementation of the Swin Transformer V2 is not yet published, it is not possible to say to which extent this implementation might differ from the original one. If you have any issues with this implementation please raise an issue.

Reference

bibtex @article{Liu2021, title={{Swin Transformer V2: Scaling Up Capacity and Resolution}}, author={Liu, Ze and Hu, Han and Lin, Yutong and Yao, Zhuliang and Xie, Zhenda and Wei, Yixuan and Ning, Jia and Cao, Yue and Zhang, Zheng and Dong, Li and others}, journal={arXiv preprint arXiv:2111.09883}, year={2021} }

Owner

Name: Christoph Reich
Login: ChristophReich1996
Kind: user
Location: Germany
Company: Technical University of Munich

Website: christophreich1996.github.io
Twitter: ChristophR1996
Repositories: 41
Profile: https://github.com/ChristophReich1996

ELLIS Ph.D. Student @ Technical University of Munich, Technische Universität Darmstadt & University of Oxford | Prev. NEC Labs

GitHub Events

Total

Issues event: 1
Watch event: 27
Fork event: 4

Last Year

Issues event: 1
Watch event: 27
Fork event: 4

Dependencies

image_classification/requirements.txt pypi

pytorch >=1.7.0
timm >=0.4.12
torchvision >=0.8.2

requirements.txt pypi

pytorch >=1.7.0
timm >=0.4.12

setup.py pypi

torch >=1.7.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/christophreich1996/swin-transformer-v2

Science Score: 10.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Swin Transformer V2: Scaling Up Capacity and Resolution

Installation

Usage

SwinV2-T

Change resolution and window size of the model

Results

Disclaimer

Reference

Owner

GitHub Events

Total

Last Year

Dependencies