clusterformer

https://github.com/clusterformer/clusterformer

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: ClusterFormer
License: apache-2.0
Language: Python
Default Branch: main
Size: 4.15 MB

Statistics

Stars: 23
Watchers: 2
Forks: 3
Open Issues: 3
Releases: 0

Created over 2 years ago · Last pushed over 2 years ago

Metadata Files

Readme Contributing License Citation

ClusterFormer: Clustering As A Universal Visual Learner

Our arxiv version is currently available. Please check it out! 🔥🔥🔥

This repository contains the official PyTorch implementation for ClusterFormer: Clustering As A Universal Visual Learner. Our work is built upon mmclassification and other repos under OpenMMLab framework. We thank the great work of them.

Abstract

This paper presents ClusterFormer, a universal vision model that is based on the CLUSTERing paradigm with TransFORMER. It comprises two novel designs: 1. recurrent cross-attention clustering, which reformulates the cross-attention mechanism in Transformer and enables recursive updates of cluster centers to facilitate strong representation learning; and 2. feature dispatching, which uses the updated cluster centers to redistribute image features through similarity-based metrics, resulting in a transparent pipeline. This elegant design streamlines an explainable and transferable workflow, capable of tackling heterogeneous vision tasks ($i.e.$, image classification, object detection, and image segmentation) with varying levels of clustering granularity ($i.e.$, image-, box-, and pixel-level). Empirical results demonstrate that ClusterFormer outperforms various well-known specialized architectures, achieving 83.41% top-1 acc. over ImageNet-1K for image classification, 54.2% and 47.0% mAP over MS COCO for object detection and instance segmentation, 52.4% mIoU over ADE20K for semantic segmentation, and 55.8% PQ over COCO Panoptic for panoptic segmentation. For its efficacy, we hope our work can catalyze a paradigm shift in universal models in computer vision.

Figure 1: (a) Overall pipeline of ClusterFormer. (b) Each Recurrent Cross-Attention Clustering layer carries out T iterations of cross-attention clustering (E-step) and center updating (M-step) (see Eq. 3). (c) The feature dispatching redistributes the feature embeddings on the top of updated cluster centers (see Eq. 6).

Installation

Below are quick steps for installation:

shell conda create -n open-mmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision==0.11.0 -c pytorch -y conda activate open-mmlab pip3 install openmim mim install mmcv-full cd clusterformer pip3 install -e .

Please refer to install.md for more detailed installation and dataset preparation.

Training

We followed the common usage of the mmclassification and check mmclassification for more training information.

In particular, we use the slurm system to train our model. Slurm is a good job scheduling system for computing clusters.

On a cluster managed by Slurm, you can use slurm_train.sh to spawn training jobs. It supports both single-node and multi-node training.

The basic usage is as follows.

shell OMP_NUM_THREADS=1 [GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}

When using Slurm, the port option need to be set in one of the following ways:

Set the port through --options. This is more recommended since it does not change the original configs.

shell OMP_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} --options 'dist_params.port=29500' OMP_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR} --options 'dist_params.port=29501'

Modify the config files to set different communication ports.

In config1.py, set

python dist_params = dict(backend='nccl', port=29500)

In config2.py, set

python dist_params = dict(backend='nccl', port=29501)

Then you can launch two jobs with config1.py and config2.py.

shell OMP_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} OMP_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR}

Note that: - The configs are made for 8-GPU training. To train on another number of GPUs, change the GPUS. - If you want to measure the inference time, please change the number of gpu to 1 for inference. - We set OMP_NUM_THREADS=1 by default, which achieves the best speed on our machines, please change it as needed.

Acknowledgement

We thank Runjia Zeng for the contribution to the flash attention center update.

Citation

If you find our work helpful in your research, please cite it as:

@inproceedings{liang2023clusterformer, title={ClusterFormer: Clustering As A Universal Visual Learner}, author={Liang, James C and Cui, Yiming and Wang, Qifan and Geng, Tong and Wang, Wenguan and Liu, Dongfang}, booktitle={Neural Information Processing Systems (NeurIPS)}, year={2023} }

Owner

Login: ClusterFormer
Kind: user

Repositories: 1
Profile: https://github.com/ClusterFormer

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "OpenMMLab's Image Classification Toolbox and Benchmark"
authors:
  - name: "MMClassification Contributors"
version: 0.15.0
date-released: 2020-07-09
repository-code: "https://github.com/open-mmlab/mmclassification"
license: Apache-2.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science