https://github.com/aim-uofa/sine

[NeurIPS'24] A Simple Image Segmentation Framework via In-Context Examples

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, scholar.google
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary

Keywords

dinov2 generalist-model in-context-segmentation task-ambiguity

Last synced: 11 months ago · JSON representation

Repository

[NeurIPS'24] A Simple Image Segmentation Framework via In-Context Examples

Basic Info

Host: GitHub
Owner: aim-uofa
License: other
Language: Python
Default Branch: main
Homepage: https://arxiv.org/abs/2410.04842
Size: 956 KB

Statistics

Stars: 51
Watchers: 4
Forks: 2
Open Issues: 5
Releases: 0

Topics

dinov2 generalist-model in-context-segmentation task-ambiguity

Created almost 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License

A Simple Image Segmentation Framework via In-Context Examples

[Yang Liu](https://scholar.google.com/citations?user=9JcQ2hwAAAAJ&hl=en)¹, [Chenchen Jing](https://jingchenchen.github.io/)¹, Hengtao Li¹, [Muzhi Zhu](https://scholar.google.com/citations?user=064gBH4AAAAJ&hl=en)¹, [Hao Chen](https://stan-haochen.github.io/)¹, [Xinlong Wang](https://www.xloong.wang/)², [Chunhua Shen](https://cshen.github.io/)¹ ¹[Zhejiang University](https://www.zju.edu.cn/english/), ²[Beijing Academy of Artificial Intelligence](https://www.baai.ac.cn/english.html) NeurIPS 2024

🚀 Overview

📖 Description

Overview

This paper proposes a simple yet effective image segmentation framework that leverages in-context examples.
The approach allows users to provide a few annotated examples within an image, which the model then uses to segment the rest of the image.
The framework is designed to be intuitive and user-friendly, enabling non-expert users to perform accurate image segmentation.
In detail: Recently, there have been explorations of generalist segmentation models that can effectively tackle a variety of image segmentation tasks within a unified in-context learning framework. However, these methods still struggle with task ambiguity in in-context segmentation, as not all in-context examples can accurately convey the task information. In order to address this issue, we present SINE, a simple image Segmentation framework utilizing in-context examples. Our approach leverages a Transformer encoder-decoder structure, where the encoder provides high-quality image representations, and the decoder is designed to yield multiple task-specific output masks to effectively eliminate task ambiguity.

Paper

👻 Getting Started

Training.
DINOv2-L model trained on ADE20K, COCO, and Objects365, weight.
Evaluation - Few-shot Semnatic Segmentation
Evaluation - Few-shot Instance Segmentation
Evaluation - Video Object Segmentation

🎫 License

For academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.

🖊️ Citation

If you find this project useful in your research, please consider to cite:

BibTeX @article{liu2024simple, title={A Simple Image Segmentation Framework via In-Context Examples}, author={Liu, Yang and Jing, Chenchen and Li, Hengtao and Zhu, Muzhi and Chen, Hao and Wang, Xinlong and Shen, Chunhua}, journal={Proc. Int. Conference on Neural Information Processing Systems (NeurIPS)}, year={2024} }

Acknowledgement

DINOv2, Mask2Former, SegGPT, Matcher, TFA and detectron2.

FAQ

Key Contributions of the Paper:

The paper is the first to investigate and address task ambiguity in in-context segmentation.
It introduces a Matching Transformer that unlocks the potential of frozen pre-trained image models for diverse segmentation tasks with low training costs.

What is the main challenge in in-context segmentation that SINE aims to address?

The primary challenge SINE addresses is task ambiguity in in-context segmentation. This ambiguity arises when the in-context examples do not accurately or clearly convey the intended segmentation task. For instance, if the reference image only shows a single object and its annotation, the lack of additional task-related information can lead to incorrect segmentation outputs.

How does SINE address task ambiguity?

SINE tackles task ambiguity by predicting multiple output masks, each customized for tasks of varying complexity, ranging from identifying identical objects to instances and overall semantic concepts. This approach allows SINE to disentangle the specific task from the in-context example and interpret the semantic meaning of the prompts to produce results at different levels of task granularity.

How does SINE compare to SegGPT, another in-context segmentation model?

Both SINE and SegGPT are in-context segmentation models, but SINE offers several advantages: Addressing task ambiguity: SINE can handle task ambiguity by generating multiple task-specific output masks, while SegGPT is limited to semantic segmentation and cannot resolve such ambiguities.
Handling instance segmentation: SINE can perform instance segmentation, a capability lacking in SegGPT.
Direct mask prediction: SINE directly predicts segmentation masks, avoiding the complex post-processing steps required by SegGPT to convert its RGB pixel output to masks.
Handling high-resolution images: Unlike SegGPT, which stitches the reference and target images, SINE processes them separately, eliminating limitations in processing high-resolution images.

What are the limitations of SINE?

Limited scope of ambiguity resolution: SINE primarily focuses on addressing ambiguities between ID, instance, and semantic segmentation tasks. More complex ambiguities, such as those related to object parts, spatial positions, categories, and colors, are not explicitly addressed. Future work could incorporate multimodal in-context examples (e.g., image and text) to tackle these more intricate ambiguities.
Performance gap with SegGPT: SINE exhibits a performance gap compared to SegGPT, particularly in handling complex video sequences. This gap is attributed to SINE's use of fewer trainable parameters and a simpler In-context Interaction module, limiting its ability to capture complex inter-frame relationships. Designing a more sophisticated In-context Interaction module is a potential avenue for improvement.

Owner

Name: Advanced Intelligent Machines (AIM)
Login: aim-uofa
Kind: organization
Location: China

Repositories: 23
Profile: https://github.com/aim-uofa

A research team at Zhejiang University, focusing on Computer Vision and broad AI research ...

GitHub Events

Total

Issues event: 8
Watch event: 51
Issue comment event: 4
Public event: 1
Push event: 10
Pull request event: 1
Fork event: 4

Last Year

Issues event: 8
Watch event: 51
Issue comment event: 4
Public event: 1
Push event: 10
Pull request event: 1
Fork event: 4

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 6
Total pull requests: 1
Average time to close issues: 3 days
Average time to close pull requests: N/A
Total issue authors: 6
Total pull request authors: 1
Average comments per issue: 0.5
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 6
Pull requests: 1
Average time to close issues: 3 days
Average time to close pull requests: N/A
Issue authors: 6
Pull request authors: 1
Average comments per issue: 0.5
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

mandal4 (1)
xiaoyaod (1)
gongqiwen03 (1)
cuffak (1)
geek-APTX4869 (1)
hiyyg (1)

Pull Request Authors

ding3820 (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

dinov2/eval/setup.py pypi

inference_fsod/dinov2/eval/setup.py pypi

inference_vos/davis2017-evaluation/setup.py pypi

Pillow >=4.1.1
networkx >=2.0
numpy >=1.12.1
opencv-python >=4.0.0.21
pandas >=0.21.1
pathlib2 *
scikit-image >=0.13.1
scikit-learn >=0.18
scipy >=1.0.0
tqdm >=4.28.1

requirements.txt pypi

deepspeed ==0.11.0
numpy ==1.26.1
omegaconf ==2.3.0
opencv-python ==4.8.0.76
timm ==0.9.17
torch ==2.0.1
torchvision ==0.15.2
tqdm ==4.66.1
xformers ==0.0.21

https://github.com/aim-uofa/sine

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

A Simple Image Segmentation Framework via In-Context Examples

🚀 Overview

📖 Description

Overview

👻 Getting Started

🎫 License

🖊️ Citation

Acknowledgement

FAQ

Key Contributions of the Paper:

What is the main challenge in in-context segmentation that SINE aims to address?

How does SINE address task ambiguity?

How does SINE compare to SegGPT, another in-context segmentation model?

What are the limitations of SINE?

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies