https://github.com/aim-uofa/sine
[NeurIPS'24] A Simple Image Segmentation Framework via In-Context Examples
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, scholar.google -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary
Keywords
Repository
[NeurIPS'24] A Simple Image Segmentation Framework via In-Context Examples
Basic Info
- Host: GitHub
- Owner: aim-uofa
- License: other
- Language: Python
- Default Branch: main
- Homepage: https://arxiv.org/abs/2410.04842
- Size: 956 KB
Statistics
- Stars: 51
- Watchers: 4
- Forks: 2
- Open Issues: 5
- Releases: 0
Topics
Metadata Files
README.md
A Simple Image Segmentation Framework via In-Context Examples
[Yang Liu](https://scholar.google.com/citations?user=9JcQ2hwAAAAJ&hl=en)1, [Chenchen Jing](https://jingchenchen.github.io/)1, Hengtao Li1, [Muzhi Zhu](https://scholar.google.com/citations?user=064gBH4AAAAJ&hl=en)1, [Hao Chen](https://stan-haochen.github.io/)1, [Xinlong Wang](https://www.xloong.wang/)2, [Chunhua Shen](https://cshen.github.io/)1 1[Zhejiang University](https://www.zju.edu.cn/english/), 2[Beijing Academy of Artificial Intelligence](https://www.baai.ac.cn/english.html) NeurIPS 2024🚀 Overview
📖 Description
Overview
- This paper proposes a simple yet effective image segmentation framework that leverages in-context examples.
- The approach allows users to provide a few annotated examples within an image, which the model then uses to segment the rest of the image.
The framework is designed to be intuitive and user-friendly, enabling non-expert users to perform accurate image segmentation.
In detail: Recently, there have been explorations of generalist segmentation models that can effectively tackle a variety of image segmentation tasks within a unified in-context learning framework. However, these methods still struggle with task ambiguity in in-context segmentation, as not all in-context examples can accurately convey the task information. In order to address this issue, we present SINE, a simple image Segmentation framework utilizing in-context examples. Our approach leverages a Transformer encoder-decoder structure, where the encoder provides high-quality image representations, and the decoder is designed to yield multiple task-specific output masks to effectively eliminate task ambiguity.
👻 Getting Started
DINOv2-L model trained on ADE20K, COCO, and Objects365, weight.
🎫 License
For academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.
🖊️ Citation
If you find this project useful in your research, please consider to cite:
BibTeX
@article{liu2024simple,
title={A Simple Image Segmentation Framework via In-Context Examples},
author={Liu, Yang and Jing, Chenchen and Li, Hengtao and Zhu, Muzhi and Chen, Hao and Wang, Xinlong and Shen, Chunhua},
journal={Proc. Int. Conference on Neural Information Processing Systems (NeurIPS)},
year={2024}
}
Acknowledgement
DINOv2, Mask2Former, SegGPT, Matcher, TFA and detectron2.
FAQ
Key Contributions of the Paper:
- The paper is the first to investigate and address task ambiguity in in-context segmentation.
- It introduces a Matching Transformer that unlocks the potential of frozen pre-trained image models for diverse segmentation tasks with low training costs.
What is the main challenge in in-context segmentation that SINE aims to address?
- The primary challenge SINE addresses is task ambiguity in in-context segmentation. This ambiguity arises when the in-context examples do not accurately or clearly convey the intended segmentation task. For instance, if the reference image only shows a single object and its annotation, the lack of additional task-related information can lead to incorrect segmentation outputs.
How does SINE address task ambiguity?
- SINE tackles task ambiguity by predicting multiple output masks, each customized for tasks of varying complexity, ranging from identifying identical objects to instances and overall semantic concepts. This approach allows SINE to disentangle the specific task from the in-context example and interpret the semantic meaning of the prompts to produce results at different levels of task granularity.
How does SINE compare to SegGPT, another in-context segmentation model?
- Both SINE and SegGPT are in-context segmentation models, but SINE offers several advantages: Addressing task ambiguity: SINE can handle task ambiguity by generating multiple task-specific output masks, while SegGPT is limited to semantic segmentation and cannot resolve such ambiguities.
- Handling instance segmentation: SINE can perform instance segmentation, a capability lacking in SegGPT.
- Direct mask prediction: SINE directly predicts segmentation masks, avoiding the complex post-processing steps required by SegGPT to convert its RGB pixel output to masks.
- Handling high-resolution images: Unlike SegGPT, which stitches the reference and target images, SINE processes them separately, eliminating limitations in processing high-resolution images.
What are the limitations of SINE?
- Limited scope of ambiguity resolution: SINE primarily focuses on addressing ambiguities between ID, instance, and semantic segmentation tasks. More complex ambiguities, such as those related to object parts, spatial positions, categories, and colors, are not explicitly addressed. Future work could incorporate multimodal in-context examples (e.g., image and text) to tackle these more intricate ambiguities.
- Performance gap with SegGPT: SINE exhibits a performance gap compared to SegGPT, particularly in handling complex video sequences. This gap is attributed to SINE's use of fewer trainable parameters and a simpler In-context Interaction module, limiting its ability to capture complex inter-frame relationships. Designing a more sophisticated In-context Interaction module is a potential avenue for improvement.
Owner
- Name: Advanced Intelligent Machines (AIM)
- Login: aim-uofa
- Kind: organization
- Location: China
- Repositories: 23
- Profile: https://github.com/aim-uofa
A research team at Zhejiang University, focusing on Computer Vision and broad AI research ...
GitHub Events
Total
- Issues event: 8
- Watch event: 51
- Issue comment event: 4
- Public event: 1
- Push event: 10
- Pull request event: 1
- Fork event: 4
Last Year
- Issues event: 8
- Watch event: 51
- Issue comment event: 4
- Public event: 1
- Push event: 10
- Pull request event: 1
- Fork event: 4
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 6
- Total pull requests: 1
- Average time to close issues: 3 days
- Average time to close pull requests: N/A
- Total issue authors: 6
- Total pull request authors: 1
- Average comments per issue: 0.5
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 6
- Pull requests: 1
- Average time to close issues: 3 days
- Average time to close pull requests: N/A
- Issue authors: 6
- Pull request authors: 1
- Average comments per issue: 0.5
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- mandal4 (1)
- xiaoyaod (1)
- gongqiwen03 (1)
- cuffak (1)
- geek-APTX4869 (1)
- hiyyg (1)
Pull Request Authors
- ding3820 (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- Pillow >=4.1.1
- networkx >=2.0
- numpy >=1.12.1
- opencv-python >=4.0.0.21
- pandas >=0.21.1
- pathlib2 *
- scikit-image >=0.13.1
- scikit-learn >=0.18
- scipy >=1.0.0
- tqdm >=4.28.1
- deepspeed ==0.11.0
- numpy ==1.26.1
- omegaconf ==2.3.0
- opencv-python ==4.8.0.76
- timm ==0.9.17
- torch ==2.0.1
- torchvision ==0.15.2
- tqdm ==4.66.1
- xformers ==0.0.21