010-sed-a-simple-encoder-decoder-for-open-vocabulary-semantic-segmentation

https://github.com/szu-advtech-2024/010-sed-a-simple-encoder-decoder-for-open-vocabulary-semantic-segmentation

Science Score: 41.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 40% confidence

Last synced: 4 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: SZU-AdvTech-2024
Default Branch: main
Size: 0 Bytes

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created 12 months ago · Last pushed 12 months ago

Metadata Files

Citation

https://github.com/SZU-AdvTech-2024/010-SED-A-Simple-Encoder-Decoder-for-Open-Vocabulary-Semantic-Segmentation/blob/main/

# SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

This is our official pytorch implementation of SED.

## :fire: News
- SED is accepted by CVPR 2024.

## Introduction


- We propose an encoder-decoder for open-vocabulary semantic segmentation comprising a hierarchical encoder-based cost map generation and a gradual fusion decoder.
- We introduce a category early rejection scheme to reject non-existing categories at the early layer, which aids in markedly 
  increasing the inference speed without any significant degradation in  segmentation performance. For instance, it provides 4.7  times acceleration on PC-459.
- Our proposed method, SED, achieves the superior performance on  multiple open-vocabulary  segmentation datasets. 
  Specifically, the proposed SED provides a good trade-off in terms of segmentation performance and speed. 
  When using ConvNeXt-L, our proposed SED obtains mIoU scores of 35.2\% on A-150 and 22.6\% on PC-459.

For further details and visualization results, please check out our [paper](https://arxiv.org/abs/2311.15537).

## Installation
Please follow [installation](INSTALL.md). 

## Data Preparation
Please follow [dataset preperation](datasets/README.md).

## Training
We provide shell scripts for training and evaluation. ```run.sh``` trains the model in default configuration and evaluates the model after training. 

To train or evaluate the model in different environments, modify the given shell script and config files accordingly.

### Training script
```bash
sh run.sh [CONFIG] [NUM_GPUS] [OUTPUT_DIR] [OPTS]

# For ConvNeXt-B variant
sh run.sh configs/convnextB_768.yaml 4 output/
# For ConvNeXt-L variant
sh run.sh configs/convnextL_768.yaml 4 output/
```

## Evaluation
```eval.sh``` automatically evaluates the model following our evaluation protocol, with weights in the output directory if not specified.
To individually run the model in different datasets, please refer to the commands in ```eval.sh```.

### Evaluation script
```bash
sh run.sh [CONFIG] [NUM_GPUS] [OUTPUT_DIR] [OPTS]

sh eval.sh configs/convnextB_768.yaml 4 output/ MODEL.WEIGHTS path/to/weights.pth

# Fast version.
sh eval.sh configs/convnextB_768.yaml 4 output/ MODEL.WEIGHTS path/to/weights.pth  TEST.FAST_INFERENCE True  TEST.TOPK 8
```

## Results


We provide pretrained weights for our models reported in the paper. All of the models were evaluated with 4 NVIDIA A6000 GPUs, and can be reproduced with the evaluation script above. 
The inference time is reported on a single NVIDIA A6000 GPU.




Name
CLIP
A-847
PC-459
A-150
PC-59
PAS-20
Download



SED (B)
ConvNeXt-B
11.2
18.6
31.8
57.7
94.4
ckpt 



SED-fast (B)
ConvNeXt-B
11.4
18.6
31.6
57.3
94.4
ckpt 



SED (L)
ConvNeXt-L
13.7
22.1
35.3
60.9
96.1
ckpt 


 SED-fast (L)
ConvNeXt-L
13.9
22.6
35.2
60.6
96.1
ckpt 





## Citation

```BibTeX
@inproceedings{xie2024sed,
      title={SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation}, 
      author={Bin Xie and Jiale Cao and Jin Xie and Fahad Shahbaz Khan and Yanwei Pang},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      year={2024},
}
```

## Acknowledgement
We would like to acknowledge the contributions of public projects, such as [CAT-Seg](https://github.com/KU-CVLAB/CAT-Seg), whose code has been utilized in this repository.

Name	CLIP	A-847	PC-459	A-150	PC-59	PAS-20	Download
SED (B)	ConvNeXt-B	11.2	18.6	31.8	57.7	94.4	ckpt
SED-fast (B)	ConvNeXt-B	11.4	18.6	31.6	57.3	94.4	ckpt
SED (L)	ConvNeXt-L	13.7	22.1	35.3	60.9	96.1	ckpt
SED-fast (L)	ConvNeXt-L	13.9	22.6	35.2	60.6	96.1	ckpt

Owner

Name: SZU-AdvTech-2024
Login: SZU-AdvTech-2024
Kind: organization

Repositories: 1
Profile: https://github.com/SZU-AdvTech-2024

Citation (citation.txt)

@inproceedings{REPO010,
    author = "Xie, Bin and Cao, Jiale and Xie, Jin and Khan, Fahad Shahbaz and Pang, Yanwei",
    booktitle = "Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition",
    title = "{SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation}",
    year = "2024"
}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science