gpvit

[ICLR 2023 Spotlight] GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation

https://github.com/chenhongyiyang/gpvit

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.4%) to scientific vocabulary

Keywords

computer-vision image-classification instance-segmentation object-detection semantic-segmentation vision-transformer visual-recognition
Last synced: 6 months ago · JSON representation

Repository

[ICLR 2023 Spotlight] GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation

Basic Info
  • Host: GitHub
  • Owner: ChenhongyiYang
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 6.16 MB
Statistics
  • Stars: 100
  • Watchers: 4
  • Forks: 3
  • Open Issues: 2
  • Releases: 1
Topics
computer-vision image-classification instance-segmentation object-detection semantic-segmentation vision-transformer visual-recognition
Created about 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme Contributing License Citation

README.md

GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation

This repository contains the official PyTorch implementation of GPViT, a high-resolution non-hierarchical vision transformer architecture designed for high-performing visual recognition, which is introduced in our paper:

GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation, Chenhongyi Yang, *Jiarui Xu, *Shalini De Mello, Elliot J. Crowley, Xiaolong Wang, ICLR 2023

Usage

Environment Setup

Our code base is built upon the MM-series toolkits. Specifically, classification is based on MMClassification; object detection is based on MMDetection; and semantic segmentation is based on MMSegmentation. Users can follow the official site of those toolkit to set up their environments. We also provide a sample setting up script as following:

shell conda create -n gpvit python=3.7 -y source activate gpvit pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 -f https://download.pytorch.org/whl/torch_stable.html pip install -U openmim mim install mmcv-full==1.4.8 pip install timm pip install lmdb # for ImageNet experiments pip install -v -e . cd downstream/mmdetection # set up object detection and instance segmentation pip install -v -e . cd ../mmsegmentation # set up semantic segmentation pip install -v -e .

Data Preparation

Please follow MMClassification, MMDetection and MMSegmentation to set up the ImageNet, COCO and ADE20K datasets. For ImageNet experiment, we convert the dataset to LMDB format to accelerate training and testing. For example, you can convert you own dataset by running: shell python tools/dataset_tools/create_lmdb_dataset.py \ --train-img-dir data/imagenet/train \ --train-out data/imagenet/imagenet_lmdb/train \ --val-img-dir data/imagenet/val \ --val-out data/imagenet/imagenet_lmdb/val After setting up, the datasets file structure should be as follows: GPViT |-- data | |-- imagenet | | |-- imagenet_lmdb | | | |-- train | | | | |-- data.mdb | | | | |__ lock.mdb | | | |-- val | | | | |-- data.mdb | | | | |__ lock.mdb | | |-- meta | | | |__ ... |-- downstream | |-- mmsegmentation | | |-- data | | | |-- ade | | | | |-- ADEChallengeData2016 | | | | | |-- annotations | | | | | | |__ ... | | | | | |-- images | | | | | | |__ ... | | | | | |-- objectInfo150.txt | | | | | |__ sceneCategories.txt | | |__ ... | |-- mmdetection | | |-- data | | | |-- coco | | | | |-- train2017 | | | | | |-- ... | | | | |-- val2017 | | | | | |-- ... | | | | |-- annotations | | | | | |-- instances_train2017.json | | | | | |-- instances_val2017.json | | | | | |__ ... | | |__ ... |__ ...

ImageNet Classification

Training GPViT

```shell

Example: Training GPViT-L1 model

zsh tool/disttrain.sh configs/gpvit/gpvitl1.py 16 ```

Testing GPViT

```shell

Example: Testing GPViT-L1 model

zsh tool/disttest.sh configs/gpvit/gpvitl1.py workdirs/gpvitl1/epoch_300.pth 16 --metrics accuracy ```

COCO Object Detection and Instance Segmentation

Run cd downstream/mmdetection first.

Training GPViT based Mask R-CNN

```shell

Example: Training GPViT-L1 models with 1x and 3x+MS schedules

zsh tools/disttrain.sh configs/gpvit/maskrcnn/gpvitl1maskrcnn1x.py 16 zsh tools/disttrain.sh configs/gpvit/maskrcnn/gpvitl1maskrcnn3x.py 16 ```

Training GPViT based RetinaNet

```shell

Example: Training GPViT-L1 models with 1x and 3x+MS schedules

zsh tools/disttrain.sh configs/gpvit/retinanet/gpvitl1retinanet1x.py 16 zsh tools/disttrain.sh configs/gpvit/retinanet/gpvitl1retinanet3x.py 16 ```

Testing GPViT based Mask R-CNN

```shell

Example: Testing GPViT-L1 Mask R-CNN 1x model

zsh tools/disttest.sh configs/gpvit/maskrcnn/gpvitl1maskrcnn1x.py workdirs/gpvitl1maskrcnn1x/epoch12.pth 16 --eval bbox segm ```

Testing GPViT based RetinaNet

```shell

Example: Testing GPViT-L1 RetinaNet 1x model

zsh tools/disttest.sh configs/gpvit/retinanet/gpvitl1retinanet1x.py workdirs/gpvitl1retinanet1x/epoch_12.pth 16 --eval bbox ```

ADE20K Semantic Segmentation

Run cd downstream/mmsegmentation first.

Training GPViT based semantic segmentation models

```shell

Example: Training GPViT-L1 based SegFormer and UperNet models

zsh tools/disttrain.sh configs/gpvit/gpvitl1segformer.py 16 zsh tools/disttrain.sh configs/gpvit/gpvitl1upernet.py 16 ```

Testing GPViT based semantic segmentation models

```shell

Example: Testing GPViT-L1 based SegFormer and UperNet models

zsh tools/disttest.sh configs/gpvit/gpvitl1segformer.py workdirs/gpvitl1segformer/iter160000.pth 16 --eval mIoU zsh tools/disttest.sh configs/gpvit/gpvitl1upernet.py workdirs/gpvitl1upernet/iter160000.pth 16 --eval mIoU ```

Benchmark results

ImageNet-1k Classification

| Model | #Params (M) | Top-1 Acc | Top-5 Acc | Config | Model | |:--------:|:-----------:|:---------:|:---------:|:----------:|:------------------------------------------------------------------------------------------------:| | GPViT-L1 | 9.3 | 80.5 | 95.4 | config | model | | GPViT-L2 | 23.8 | 83.4 | 96.6 | config | model | | GPViT-L3 | 36.2 | 84.1 | 96.9 | config | model | | GPViT-L4 | 75.4 | 84.3 | 96.9 | config | model |

COCO Mask R-CNN 1x Schedule

| Model | #Params (M) | AP Box | AP Mask | Config | Model | |:--------:|:-----------:|:------:|:-------:|:----------:|:---------:| | GPViT-L1 | 33 | 48.1 | 42.7 | config | model | | GPViT-L2 | 50 | 49.9 | 43.9 | config | model | | GPViT-L3 | 64 | 50.4 | 44.4 | config | model | | GPViT-L4 | 109 | 51.0 | 45.0 | config | model |

COCO Mask R-CNN 3x+MS Schedule

| Model | #Params (M) | AP Box | AP Mask | Config | Model | |:--------:|:-----------:|:------:|:-------:|:----------:|:---------:| | GPViT-L1 | 33 | 50.2 | 44.3 | config | model | | GPViT-L2 | 50 | 51.4 | 45.1 | config | model | | GPViT-L3 | 64 | 51.6 | 45.2 | config | model | | GPViT-L4 | 109 | 52.1 | 45.7 | config | model |

COCO RetinaNet 1x Schedule

| Model | #Params (M) | AP Box | Config | Model | |:--------:|:-----------:|:------:|:----------:|:---------:| | GPViT-L1 | 21 | 45.8 | config | model | | GPViT-L2 | 37 | 48.0 | config | model | | GPViT-L3 | 52 | 48.3 | config | model | | GPViT-L4 | 96 | 48.7 | config | model |

COCO RetinaNet 3x+MS Schedule

| Model | #Params (M) | AP Box | Config | Model | |:--------:|:-----------:|:------:|:----------:|:---------:| | GPViT-L1 | 21 | 48.1 | config | model | | GPViT-L2 | 37 | 49.0 | config | model | | GPViT-L3 | 52 | 49.4 | config | model | | GPViT-L4 | 96 | 49.8 | config | model |

ADE20K UperNet

| Model | #Params (M) | mIoU | Config | Model | |:--------:|:-----------:|:----:|:----------:|:---------:| | GPViT-L1 | 37 | 49.1 | config | model | | GPViT-L2 | 53 | 50.2 | config | model | | GPViT-L3 | 66 | 51.7 | config | model | | GPViT-L4 | 107 | 52.5 | config | model |

ADE20K SegFormer

| Model | #Params (M) | mIoU | Config | Model | |:--------:|:-----------:|:----:|:----------:|:---------:| | GPViT-L1 | 9 | 46.9 | config | model | | GPViT-L2 | 24 | 49.2 | config | model | | GPViT-L3 | 36 | 50.8 | config | model | | GPViT-L4 | 76 | 51.3 | config | model |

Citation

@InProceedings{yang2023gpvit, title={{GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation}}, author={Chenhongyi Yang and Jiarui Xu and Shalini De Mello and Elliot J. Crowley and Xiaolong Wang}, journal={ICLR} year={2023}, }

Owner

  • Name: Chenhongyi Yang
  • Login: ChenhongyiYang
  • Kind: user
  • Location: Zurich, Switzerland
  • Company: Meta

Research Scientist at Meta Reality Labs

GitHub Events

Total
  • Watch event: 5
Last Year
  • Watch event: 5

Packages

  • Total packages: 2
  • Total downloads: unknown
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 0
    (may contain duplicates)
  • Total versions: 2
proxy.golang.org: github.com/chenhongyiyang/gpvit
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.5%
Average: 6.7%
Dependent repos count: 7.0%
Last synced: 6 months ago
proxy.golang.org: github.com/ChenhongyiYang/GPViT
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.5%
Average: 6.7%
Dependent repos count: 7.0%
Last synced: 6 months ago

Dependencies

downstream/mmdetection/requirements/albu.txt pypi
  • albumentations >=0.3.2
downstream/mmdetection/requirements/build.txt pypi
  • cython *
  • numpy *
downstream/mmdetection/requirements/docs.txt pypi
  • docutils ==0.16.0
  • myst-parser *
  • sphinx ==4.0.2
  • sphinx-copybutton *
  • sphinx_markdown_tables *
  • sphinx_rtd_theme ==0.5.2
downstream/mmdetection/requirements/mminstall.txt pypi
  • mmcv-full >=1.3.17
downstream/mmdetection/requirements/optional.txt pypi
  • cityscapesscripts *
  • imagecorruptions *
  • scipy *
  • sklearn *
  • timm *
downstream/mmdetection/requirements/readthedocs.txt pypi
  • mmcv *
  • torch *
  • torchvision *
downstream/mmdetection/requirements/runtime.txt pypi
  • matplotlib *
  • numpy *
  • pycocotools *
  • six *
  • terminaltables *
downstream/mmdetection/requirements/tests.txt pypi
  • asynctest * test
  • codecov * test
  • flake8 * test
  • interrogate * test
  • isort ==4.3.21 test
  • kwarray * test
  • onnx ==1.7.0 test
  • onnxruntime >=1.8.0 test
  • protobuf <=3.20.1 test
  • pytest * test
  • ubelt * test
  • xdoctest >=0.10.0 test
  • yapf * test
downstream/mmsegmentation/requirements/docs.txt pypi
  • docutils ==0.16.0
  • myst-parser *
  • sphinx ==4.0.2
  • sphinx_copybutton *
  • sphinx_markdown_tables *
downstream/mmsegmentation/requirements/mminstall.txt pypi
  • mmcls >=0.20.1
  • mmcv-full >=1.4.4,<=1.5.0
downstream/mmsegmentation/requirements/optional.txt pypi
  • cityscapesscripts *
downstream/mmsegmentation/requirements/readthedocs.txt pypi
  • mmcv *
  • prettytable *
  • torch *
  • torchvision *
downstream/mmsegmentation/requirements/runtime.txt pypi
  • matplotlib *
  • mmcls >=0.20.1
  • numpy *
  • packaging *
  • prettytable *
downstream/mmsegmentation/requirements/tests.txt pypi
  • codecov * test
  • flake8 * test
  • interrogate * test
  • pytest * test
  • xdoctest >=0.10.0 test
  • yapf * test
requirements/docs.txt pypi
  • docutils ==0.17.1
  • myst-parser *
  • pytorch_sphinx_theme *
  • sphinx ==4.5.0
  • sphinx-copybutton *
  • sphinx_markdown_tables *
requirements/mminstall.txt pypi
  • einops >=0.6.0
  • mmcv-full >=1.4.2,<1.9.0
requirements/optional.txt pypi
  • albumentations >=0.3.2
  • colorama *
  • requests *
  • rich *
  • scipy *
requirements/readthedocs.txt pypi
  • mmcv >=1.4.2
  • torch *
  • torchvision *
requirements/runtime.txt pypi
  • einops >=0.6.0
  • matplotlib >=3.1.0
  • numpy *
  • packaging *
requirements/tests.txt pypi
  • codecov * test
  • flake8 * test
  • interrogate * test
  • isort ==4.3.21 test
  • mmdet * test
  • pytest * test
  • xdoctest >=0.10.0 test
  • yapf * test
downstream/mmdetection/requirements.txt pypi
downstream/mmdetection/setup.py pypi
downstream/mmsegmentation/requirements.txt pypi
downstream/mmsegmentation/setup.py pypi
requirements.txt pypi
setup.py pypi