clusterformer
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: ClusterFormer
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 4.15 MB
Statistics
- Stars: 23
- Watchers: 2
- Forks: 3
- Open Issues: 3
- Releases: 0
Metadata Files
README.md
ClusterFormer: Clustering As A Universal Visual Learner
Our arxiv version is currently available. Please check it out! 🔥🔥🔥
This repository contains the official PyTorch implementation for ClusterFormer: Clustering As A Universal Visual Learner. Our work is built upon mmclassification and other repos under OpenMMLab framework. We thank the great work of them.
Abstract
This paper presents ClusterFormer, a universal vision model that is based on the CLUSTERing paradigm with TransFORMER. It comprises two novel designs: 1. recurrent cross-attention clustering, which reformulates the cross-attention mechanism in Transformer and enables recursive updates of cluster centers to facilitate strong representation learning; and 2. feature dispatching, which uses the updated cluster centers to redistribute image features through similarity-based metrics, resulting in a transparent pipeline. This elegant design streamlines an explainable and transferable workflow, capable of tackling heterogeneous vision tasks ($i.e.$, image classification, object detection, and image segmentation) with varying levels of clustering granularity ($i.e.$, image-, box-, and pixel-level). Empirical results demonstrate that ClusterFormer outperforms various well-known specialized architectures, achieving 83.41% top-1 acc. over ImageNet-1K for image classification, 54.2% and 47.0% mAP over MS COCO for object detection and instance segmentation, 52.4% mIoU over ADE20K for semantic segmentation, and 55.8% PQ over COCO Panoptic for panoptic segmentation. For its efficacy, we hope our work can catalyze a paradigm shift in universal models in computer vision.
Figure 1: (a) Overall pipeline of ClusterFormer. (b) Each Recurrent Cross-Attention Clustering layer carries out T iterations of cross-attention clustering (E-step) and center updating (M-step) (see Eq. 3). (c) The feature dispatching redistributes the feature embeddings on the top of updated cluster centers (see Eq. 6).
Installation
Below are quick steps for installation:
shell
conda create -n open-mmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision==0.11.0 -c pytorch -y
conda activate open-mmlab
pip3 install openmim
mim install mmcv-full
cd clusterformer
pip3 install -e .
Please refer to install.md for more detailed installation and dataset preparation.
Training
We followed the common usage of the mmclassification and check mmclassification for more training information.
In particular, we use the slurm system to train our model. Slurm is a good job scheduling system for computing clusters.
On a cluster managed by Slurm, you can use slurm_train.sh to spawn training jobs. It supports both single-node and multi-node training.
The basic usage is as follows.
shell
OMP_NUM_THREADS=1 [GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}
When using Slurm, the port option need to be set in one of the following ways:
- Set the port through
--options. This is more recommended since it does not change the original configs.
shell
OMP_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} --options 'dist_params.port=29500'
OMP_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR} --options 'dist_params.port=29501'
- Modify the config files to set different communication ports.
In config1.py, set
python
dist_params = dict(backend='nccl', port=29500)
In config2.py, set
python
dist_params = dict(backend='nccl', port=29501)
Then you can launch two jobs with config1.py and config2.py.
shell
OMP_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR}
OMP_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR}
Note that:
- The configs are made for 8-GPU training. To train on another number of GPUs, change the GPUS.
- If you want to measure the inference time, please change the number of gpu to 1 for inference.
- We set OMP_NUM_THREADS=1 by default, which achieves the best speed on our machines, please change it as needed.
Acknowledgement
We thank Runjia Zeng for the contribution to the flash attention center update.
Citation
If you find our work helpful in your research, please cite it as:
@inproceedings{liang2023clusterformer,
title={ClusterFormer: Clustering As A Universal Visual Learner},
author={Liang, James C and Cui, Yiming and Wang, Qifan and Geng, Tong and Wang, Wenguan and Liu, Dongfang},
booktitle={Neural Information Processing Systems (NeurIPS)},
year={2023}
}
Owner
- Login: ClusterFormer
- Kind: user
- Repositories: 1
- Profile: https://github.com/ClusterFormer
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." title: "OpenMMLab's Image Classification Toolbox and Benchmark" authors: - name: "MMClassification Contributors" version: 0.15.0 date-released: 2020-07-09 repository-code: "https://github.com/open-mmlab/mmclassification" license: Apache-2.0
GitHub Events
Total
- Issues event: 3
- Watch event: 15
- Issue comment event: 1
Last Year
- Issues event: 3
- Watch event: 15
- Issue comment event: 1