https://github.com/aim-uofa/segvit
Official Pytorch Implementation of SegViT: Semantic Segmentation with Plain Vision Transformers
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.1%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Official Pytorch Implementation of SegViT: Semantic Segmentation with Plain Vision Transformers
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 1
- Open Issues: 0
- Releases: 0
Fork of zbwxp/SegVit
Created over 2 years ago
· Last pushed over 2 years ago
https://github.com/aim-uofa/SegVit/blob/master/
# Official Pytorch Implementation of SegViT ### SegViT: Semantic Segmentation with Plain Vision Transformers Zhang, Bowen and Tian, Zhi and Tang, Quan and Chu, Xiangxiang and Wei, Xiaolin and Shen, Chunhua and Liu, Yifan. NeurIPS 2022. [[paper]](https://arxiv.org/abs/2210.05844) ### SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers Bowen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen and Yifan Liu. IJCV 2023. [[paper]](https://arxiv.org/abs/2306.06289) [we are refactoring code for release ...] This repository contains the official Pytorch implementation of training & evaluation code and the pretrained models for SegViT and the extended version SegViT v2. ## Highlights * **Simple Decoder:** The Attention-to-Mask (ATM) decoder provides a simple segmentation head for Plain Vision Transformer, which is easy to extend to other downstream tasks. * **Light Structure:** We proposed *Shrunk* structure that can save up to **40%** computational cost in a structure with ViT backbone. * **Stronger performance:** We got state-of-the-art performance mIoU **55.2%** on ADE20K, mIoU **50.3%** on COCOStuff10K, and mIoU **65.3%** on PASCAL-Context datasets with the least amount of computational cost among counterparts using ViT backbone. * **Scaleability** SegViT v2 employed more powerful backbones (BEiT-V2) obtained state-of-the-art performance mIoU **58.2%** (MS) on ADE20K, mIoU **53.5%** (MS) on COCOStuff10K, and mIoU **67.14%** (MS) on PASCAL-Context datasets, showcasing strong scalability. * **Continuals Learning** We propose to adapt SegViT v2 for continual semantic segmentation, demonstrating nearly zero forgetting of previously learned knowledge. As shown in the following figure, the similarity between the class query and the image features is transfered to the segmentation mask.![]()
![]()
## Getting started 1. Install the [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) library and some required packages. ```bash pip install mmcv-full==1.4.4 mmsegmentation==0.24.0 pip install scipy timm ``` ## Training ``` python tools/dist_train.sh configs/segvit/segvit_vit-l_jax_640x640_160k_ade20k.py ``` ## Evaluation ``` python tools/dist_test.sh configs/segvit/segvit_vit-l_jax_640x640_160k_ade20k.py {path_to_ckpt} ``` ## Datasets Please follow the instructions of [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) data preparation ## Results | Model backbone |datasets| mIoU | mIoU (ms) | GFlops | ckpt | ------------------ |--------------|---------------- | -------------- |--- |--- Vit-Base | ADE20k | 51.3 | 53.0 | 120.9 |[model](https://pan.baidu.com/s/1KND14jl3-SLoY22PauPAEQ?pwd=qww5) Vit-Large (Shrunk) | ADE20k | 53.9 | 55.1 | 373.5 | [model](https://pan.baidu.com/s/1ImKuiO-wkwYtPc6l8ezfcg?pwd=x27q) Vit-Large | ADE20k | 54.6 | 55.2 | 637.9 | [model](https://pan.baidu.com/s/1l2DbNgTzIUQzysRrlGYjxA?pwd=keac) Vit-Large (Shrunk) | COCOStuff10K | 49.1 | 49.4 | 224.8 | [model](https://pan.baidu.com/s/1vjuZDQcMCgVrfA36sVUwgg?pwd=5tdn) Vit-Large | COCOStuff10K | 49.9 | 50.3| 383.9 | [model](https://pan.baidu.com/s/1kQYeEXFNvXHRo29QRFWOSg?pwd=vygu) Vit-Large (Shrunk) | PASCAL-Context (59cls)| 62.3 | 63.7 | 186.9 | [model](https://pan.baidu.com/s/1obw7K0lQDdmLBVydgCB3Kg?pwd=9cmq) Vit-Large | PASCAL-Context (59cls)| 64.1 | 65.3 | 321.6 | [model](https://pan.baidu.com/s/13pwEgM-AcJCGHx3jM_cMvw?pwd=aspy) ## License For academic use, this project is licensed under the 2-clause BSD License - see the LICENSE file for details. For commercial use, please contact the authors. ## Citation ``` @article{zhang2022segvit, title={SegViT: Semantic Segmentation with Plain Vision Transformers}, author={Zhang, Bowen and Tian, Zhi and Tang, Quan and Chu, Xiangxiang and Wei, Xiaolin and Shen, Chunhua and Liu, Yifan}, journal={NeurIPS}, year={2022} } @article{zhang2023segvitv2, title={SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers}, author={Zhang, Bowen and Liu, Liyang and Phan, Minh Hieu and Tian, Zhi and Shen, Chunhua and Liu, Yifan}, journal={IJCV}, year={2023} } ```
Owner
- Name: Advanced Intelligent Machines (AIM)
- Login: aim-uofa
- Kind: organization
- Location: China
- Repositories: 23
- Profile: https://github.com/aim-uofa
A research team at Zhejiang University, focusing on Computer Vision and broad AI research ...
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: about 2 hours
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- Jeckinchen (1)
## Getting started
1. Install the [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) library and some required packages.
```bash
pip install mmcv-full==1.4.4 mmsegmentation==0.24.0
pip install scipy timm
```
## Training
```
python tools/dist_train.sh configs/segvit/segvit_vit-l_jax_640x640_160k_ade20k.py
```
## Evaluation
```
python tools/dist_test.sh configs/segvit/segvit_vit-l_jax_640x640_160k_ade20k.py {path_to_ckpt}
```
## Datasets
Please follow the instructions of [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) data preparation
## Results
| Model backbone |datasets| mIoU | mIoU (ms) | GFlops | ckpt
| ------------------ |--------------|---------------- | -------------- |--- |---
Vit-Base | ADE20k | 51.3 | 53.0 | 120.9 |[model](https://pan.baidu.com/s/1KND14jl3-SLoY22PauPAEQ?pwd=qww5)
Vit-Large (Shrunk) | ADE20k | 53.9 | 55.1 | 373.5 | [model](https://pan.baidu.com/s/1ImKuiO-wkwYtPc6l8ezfcg?pwd=x27q)
Vit-Large | ADE20k | 54.6 | 55.2 | 637.9 | [model](https://pan.baidu.com/s/1l2DbNgTzIUQzysRrlGYjxA?pwd=keac)
Vit-Large (Shrunk) | COCOStuff10K | 49.1 | 49.4 | 224.8 | [model](https://pan.baidu.com/s/1vjuZDQcMCgVrfA36sVUwgg?pwd=5tdn)
Vit-Large | COCOStuff10K | 49.9 | 50.3| 383.9 | [model](https://pan.baidu.com/s/1kQYeEXFNvXHRo29QRFWOSg?pwd=vygu)
Vit-Large (Shrunk) | PASCAL-Context (59cls)| 62.3 | 63.7 | 186.9 | [model](https://pan.baidu.com/s/1obw7K0lQDdmLBVydgCB3Kg?pwd=9cmq)
Vit-Large | PASCAL-Context (59cls)| 64.1 | 65.3 | 321.6 | [model](https://pan.baidu.com/s/13pwEgM-AcJCGHx3jM_cMvw?pwd=aspy)
## License
For academic use, this project is licensed under the 2-clause BSD License - see the LICENSE file for details. For commercial use, please contact the authors.
## Citation
```
@article{zhang2022segvit,
title={SegViT: Semantic Segmentation with Plain Vision Transformers},
author={Zhang, Bowen and Tian, Zhi and Tang, Quan and Chu, Xiangxiang and Wei, Xiaolin and Shen, Chunhua and Liu, Yifan},
journal={NeurIPS},
year={2022}
}
@article{zhang2023segvitv2,
title={SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers},
author={Zhang, Bowen and Liu, Liyang and Phan, Minh Hieu and Tian, Zhi and Shen, Chunhua and Liu, Yifan},
journal={IJCV},
year={2023}
}
```