lrformer

[TPAMI] Low-Resolution Self-Attention For Semantic Segmentation

https://github.com/yuhuan-wu/lrformer

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

[TPAMI] Low-Resolution Self-Attention For Semantic Segmentation

Basic Info
  • Host: GitHub
  • Owner: yuhuan-wu
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 2.75 MB
Statistics
  • Stars: 6
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

Low-Resolution Self-Attention for Semantic Segmentation

This is the official repository for Low-Resolution Self-Attention for Semantic Segmentation. We introduce a self-attention mechanism (LRSA) that computes attention in a fixed low-resolution space regardless of the input image size. This design drastically reduces computational cost while still capturing global context. Based on LRSA, we build the Low-Resolution Transformer (LRFormer) with an encoder-decoder architecture, and we provide several variants (T/S/B/L/XL). In addition, we release enhanced versions (LRFormer+) that combine the LRFormer encoder with a query-based decoder (e.g., Mask2Former) for even stronger performance.

PDF Download


Requirements

  • Python ≥ 3.8
  • PyTorch ≥ 2.0.0
  • torchvision ≥ 0.7.0+
  • mmcv >= 2.0.0rc4
  • mmdet >= 3.0
  • Other dependencies are listed in requirements.txt

Tested with pytorch 2.2.1, torchvision 0.17.1, mmcv 2.1.0, mmdet 3.3, and cuda 11.8.


Installation

You can follow the installation of mmsegmentation. For simplicity, run the following commands:

bash pip install -r requirements.txt python setup.py build develop


Introduction

Semantic segmentation requires both fine-grained detail and global context. While traditional high-resolution self-attention effectively captures local details, its computational complexity is a major bottleneck. Our proposed Low-Resolution Self-Attention (LRSA) computes self-attention in a fixed, low-resolution space—regardless of input size—thereby significantly reducing FLOPs. To compensate for any loss in local detail, we further integrate 3×3 depth-wise convolutions. Based on LRSA, we design LRFormer, a vision transformer backbone tailored for semantic segmentation. Extensive experiments on ADE20K, COCO-Stuff, and Cityscapes demonstrate that LRFormer consistently outperforms state-of-the-art models with lower computational cost.

For even higher performance, we also offer an enhanced version (LRFormer+) that couples the LRFormer encoder with a query-based decoder (e.g., Mask2Former), yielding superior segmentation results.


Pretrained Models & Results

We provide pretrained models on three popular semantic segmentation datasets. The following tables summarize the performance (FLOPs, parameter count, and mIoU) of our LRFormer and LRFormer+ variants.

Our core model file (LRFormer backbone) is located here. Optionally, you can replace the backbone of your own model with LRFormer for better performance.

ADE20K (Validation)

| Variant | FLOPs | Params | mIoU | |-----------------|:-----:|:------:|:-----:| | LRFormer-T | 17G | 13M | 46.7% | | LRFormer-S | 40G | 32M | 50.0% | | LRFormer-B | 75G | 69M | 51.0% | | LRFormer-L | 183G | 113M | 52.6% | | LRFormer-L† | 183G | 113M | 54.2% |

COCO-Stuff (Validation)

| Variant | FLOPs | Params | mIoU | |-----------------|:-----:|:------:|:-----:| | LRFormer-T | 17G | 13M | 43.9% | | LRFormer-S | 40G | 32M | 46.4% | | LRFormer-B | 75G | 69M | 47.2% | | LRFormer-L | 122G | 113M | 47.9% |

Cityscapes (Validation)

FLOPs are calculated for an input size of 1024×2048.

| Variant | FLOPs | Params | mIoU | |-----------------|:----------:|:------:|:-----:| | LRFormer-T | 122G | 13M | 80.7% | | LRFormer-S | 295G | 32M | 81.9% | | LRFormer-B | 555G | 67M | 83.0% | | LRFormer-L | 908G | 111M | 83.2% |


Enhanced Versions with Query-based Decoders (LRFormer+)

On ADE20K, we further improve segmentation by coupling our LRFormer encoder with a query-based decoder. The following table reports the performance of the LRFormer+ series:

| Variant | FLOPs | Params | mIoU | |------------------|:-----:|:------:|:-----:| | LRFormer-T+ | 53G | 31M | 49.4% | | LRFormer-S+ | 70G | 48M | 51.3% | | LRFormer-B+ | 94G | 80M | 53.7% | | LRFormer-L+† | 192G | 119M | 55.8% | | LRFormer-XL+† | 365G | 205M | 58.1% |

“†” indicates results with ImageNet-22K pretraining and larger input size (640×640).

All models, including ImageNet pretrained weights, will be available for access on Google Drive. Now available on Huggingface and Baidu Pan.

Huggingface: Download Baidu Pan: Download


Training

To train LRFormer (e.g., the LRFormer-S variant) in a distributed manner on 8 GPUs, run:

```bash bash tools/disttrain.sh configslocal/lrformer/lrformer-s-plus-160k_ade20k-512x512.py 8

bash tools/disttrain.sh $CONFIGPATH $NUM_GPUS

```

Testing

```bash bash tools/disttest.sh configslocal/lrformer/lrformerlade20k160k.py modelrelease/lrformer-l-160kade20k52.6.pth 1

bash tools/disttest.sh $CONFIGPATH $MODELPATH $NUMGPUS

```

Citation

If you are using the code/model/data provided here in a publication, please consider citing our works:

@article{wu2025lrformer, title={Low-resolution self-attention for semantic segmentation}, author={Wu, Yu-Huan and Zhang, Shi-Chen and Liu, Yun and Zhang, Le and Zhan, Xin and Zhou, Daquan and Feng, Jiashi and Cheng, Ming-Ming and Zhen, Liangli}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, year={2025} }

Other Notes

If you meet any problems, please do not hesitate to contact us. Issues and discussions are welcome in the repository! You can also contact us via sending messages to this email: wuyuhuan@mail.nankai.edu.cn

Owner

  • Name: Yu-Huan Wu
  • Login: yuhuan-wu
  • Kind: user
  • Location: Tianjin, China

Ph.D. student at Nankai University

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMSegmentation Contributors"
title: "OpenMMLab Semantic Segmentation Toolbox and Benchmark"
date-released: 2020-07-10
url: "https://github.com/open-mmlab/mmsegmentation"
license: Apache-2.0

GitHub Events

Total
  • Issues event: 1
  • Watch event: 14
  • Issue comment event: 3
  • Push event: 6
  • Fork event: 2
  • Create event: 1
Last Year
  • Issues event: 1
  • Watch event: 14
  • Issue comment event: 3
  • Push event: 6
  • Fork event: 2
  • Create event: 1