https://github.com/buaadreamer/awesome-composed-image-retrieval

Collection of Composed Image Retrieval (CIR) papers.

https://github.com/buaadreamer/awesome-composed-image-retrieval

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 26 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, sciencedirect.com, springer.com, ieee.org, acm.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.1%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Collection of Composed Image Retrieval (CIR) papers.

Basic Info
  • Host: GitHub
  • Owner: BUAADreamer
  • Default Branch: main
  • Homepage:
  • Size: 271 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of haokunwen/Awesome-Composed-Image-Retrieval
Created almost 2 years ago · Last pushed over 1 year ago

https://github.com/BUAADreamer/Awesome-Composed-Image-Retrieval/blob/main/

# Awesome-CIR
Collections for the Composed Image Retrieval (CIR), including:  
[1. Attribute-based CIR](#section1)  
[2. Supervised CIR](#section2)    
[3. Few-shot CIR](#section3)    
[4. Zero-shot CIR](#section4)    
[5. Semi-supervised CIR](#section5)    
[6. Conversational CIR](#section6)    
[7. Composed Video Retrieval (COVR)](#section7)    
[8. Sketch-based CIR](#section8)    
[9. Others](#section9)  
[10. Dataset statistics](#section10)

## 1. Attribute-based CIR
### 2021
- [1] **[ICCV'21] |** Learning Attribute-driven Disentangled Representations for Interactive Fashion Retrieval. [[Paper]](https://ieeexplore.ieee.org/document/9711479)
- [2] **[ICCV'21] |** Face Image Retrieval with Attribute Manipulation. [[Paper]](https://ieeexplore.ieee.org/document/9710728)
### 2020
- [1] **[SIGIR'20] |** Generative Attribute Manipulation Scheme for Flexible Fashion Search. [[Paper]](https://dl.acm.org/doi/10.1145/3397271.3401150)
### 2018
- [1] **[CVPR'18] |** Learning Attribute Representations with Localization for Flexible Fashion Search. [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8578902)
- [2] **[WACV'18] |** Efficient Multi-Attribute Similarity Learning Towards Attribute-based Fashion Search. [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8354290)
### 2017
- [1] **[CVPR'17] |** Memory-Augmented Attribute Manipulation Networks for Interactive Fashion Search. [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8100135)

## 2. Supervised CIR
### Pre-prints
- [1] **[Arxiv'24] |** VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval. [[Paper]](https://arxiv.org/abs/2406.04292)
- [2] **[Arxiv'23] |** Relieving Triplet Ambiguity: Consensus Network for Language-Guided Image Retrieval. [[Paper]](https://arxiv.org/abs/2306.02092)  
- [3] **[Arxiv'23] |** Ranking-aware Uncertainty for Text-guided Image Retrieval. [[Paper]](https://arxiv.org/abs/2308.08131)  
- [4] **[Arxiv'23] |** Learning with Multi-modal Gradient Attention for Explainable Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2308.16649)  
- [5] **[Arxiv'23] |** VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering. [[Paper]](https://arxiv.org/abs/2312.12273)

### 2025
- [1] **[AAAI'25] |** Leveraging Large Vision-Language Model as User Intent-aware Encoder for Composed Image Retrieval. [[Paper]](https://arxiv.org/pdf/2412.11087)
- [2] **[AAAI'25] |** ENCODER: Entity Mining and Modification Relation Binding for Composed Image Retrieval.  

### 2024
- [1] **[WACV'24] |** Bi-directional Training for Composed Image Retrieval via Text Prompt Learning. [[Paper]](https://ieeexplore.ieee.org/abstract/document/10484044)  
- [2] **[TOMM'24] |** Cross-Modal Attention Preservation with Self-Contrastive Learning for Composed Query-Based Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3639469)
- [3] **[TOMM'24] |** SPIRIT: Style-guided Patch Interaction for Fashion Image Retrieval with Text Feedback. [[Paper]](https://dl.acm.org/doi/abs/10.1145/3640345)
- [4] **[TPAMI'24] |** Self-Training Boosted Multi-Factor Matching Network for Composed Image Retrieval. [[Paper]](https://ieeexplore.ieee.org/document/10373096)
- [5] **[AAAI'24] |** Dynamic Weighted Combiner for Mixed-Modal Image Retrieval. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/28004/28023)
- [6] **[AAAI'24] |** Data Roaming and Quality Assessment for Composed Image Retrieval. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/28081/28168)
- [7] **[AAAI'24] |** FashionERN Enhance-and-Refine Network for Composed Fashion Image Retrieval. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/download/27885/27795)
- [8] **[AAAI'24] |** Decomposing Semantic Shifts for Composed Image Retrieval. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/28479/28933)
- [9] **[SIGIR'24] |** Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3626772.3657727)
- [10] **[SIGIR'24] |** CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval. [[Paper]](https://export.arxiv.org/abs/2405.19149)
- [11] **[CVPR'24] |** SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining. [[Paper]](https://arxiv.org/pdf/2404.01156)
- [12] **[ICLR'24] |** Sentence-level Prompts Benefit Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2310.05473)
- [13] **[ICLR'24] |** Composed Image Retrieval with Text Feedback via Multi-Grained Uncertainty Regularization. [[Paper]](https://arxiv.org/pdf/2211.07394v5)
- [14] **[TMLR'24] |** Candidate Set Re-ranking for Composed Image Retrieval with Dual Multimodal Encoder. [[Paper]](https://arxiv.org/abs/2305.16304)
- [15] **[TCSVT'24] |** Set of Diverse Queries with Uncertainty Regularization for Composed Image Retrieval. [[Paper]](https://ieeexplore.ieee.org/abstract/document/10530361)
- [16] **[ICMR'24] |** CLIP-ProbCR:CLIP-based Probability embedding Combination Retrieval. [[Paper]](https://dl.acm.org/doi/abs/10.1145/3652583.3657611)
- [17] **[TMM'24] |** Align and Retrieve: Composition and Decomposition Learning in Image Retrieval with Text Feedback. [[Paper]](https://ieeexplore.ieee.org/abstract/document/10568424)
- [18] **[KBS'24] |** Collaborative Group: Composed Image Retrieval via Consensus Learning From Noisy Annotations. [[Paper]](https://www.sciencedirect.com/science/article/pii/S095070512400769X?via%3Dihub)
- [19] **[TIP'24] |** Multimodal Composition Example Mining for Composed Query Image Retrieval. [[Paper]](https://ieeexplore.ieee.org/abstract/document/10418785)
- [20] **[TOIS'24] |** LLM-enhanced Composed Image Retrieval: An Intent Uncertainty-aware Linguistic-Visual Dual Channel Matching Model. [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3699715)
- [21] **[ACM MM'24] |** Semantic Distillation from Neighborhood for Composed Image Retrieval. [[Paper]](https://openreview.net/pdf?id=MAgFiw3yHG)
- [22] **[ACM MM'24] |** Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives. [[Paper]](dl.acm.org/doi/10.1145/3664647.3680808)
- [23] **[NeurIPS'24] |** Easy Regional Contrastive Learning of Expressive Fashion Representations. [[Paper]](https://openreview.net/forum?id=bCL9U2X9Jg)
- [24] **[ACL'24] |** UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation. [[Paper]](https://arxiv.org/pdf/2408.11305)

### 2023
- [1] **[TOMM'23] |** AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval. [[Paper]](https://dl.acm.org/doi/abs/10.1145/3584703)
- [2] **[TMM'23] |** Multi-Modal Transformer With Global-Local Alignment for Composed Query Image Retrieval. [[Paper]](https://ieeexplore.ieee.org/document/10012544)
- [3] **[WACV'23] |** Fashion Image Retrieval with Text Feedback by Additive Attention Compositional Learning. [[Paper]](https://ieeexplore.ieee.org/document/10030891)
- [4] **[ICMR'23] |** Dual-Path Semantic Construction Network for Composed Query-Based Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3591106.3592245)
- [5] **[TCSVT'23] |** Multi-Grained Attention Network With Mutual Exclusion for Composed Query-Based Image Retrieval. [[Paper]](https://ieeexplore.ieee.org/abstract/document/10225420)
- [6] **[TOMM'23] |** Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features. [[Paper]](https://dl.acm.org/doi/10.1145/3617597)
- [7] **[ICME'23] |** Visual-Linguistic Alignment and Composition for Image Retrieval with Text Feedback. [[Paper]](https://ieeexplore.ieee.org/document/10219821)
- [8] **[TIP'23] |** Composed Image Retrieval via Cross Relation Network With Hierarchical Aggregation Transformer. [[Paper]](https://ieeexplore.ieee.org/document/10205526)
- [9] **[MM'23] |** Target-Guided Composed Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3581783.3611817)
- [10] **[CVPR'23] |** FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks. [[Paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Han_FAME-ViL_Multi-Tasking_Vision-Language_Model_for_Heterogeneous_Fashion_Tasks_CVPR_2023_paper.pdf)
- [11] **[ICCVW'23] |** ProVLA: Compositional Image Search with Progressive Vision-Language Alignment and Multimodal Fusion. [[Paper]](https://ieeexplore.ieee.org/document/10350916)
- [12] **[NeurIPSW'23] |** NEUCORE: Neural Concept Reasoning for Composed Image Retrieval. [[Paper]](https://arxiv.org/pdf/2310.01358)
- [13] **[NeurIPSW'23] |** Benchmarking Robustness of Text-Image Composed Retrieval. [[Paper]](https://suntongtongtong.github.io/benchmarking_robustness.pdf)
- [13] **[MMW'23] |** Fashion-GPT: Integrating LLMs with Fashion Retrieval System. [[Paper]](https://dl.acm.org/doi/10.1145/3607827.3616844)

### 2022
- [1] **[ICLR'22] |** ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity. [[Paper]](https://openreview.net/forum?id=CVfLvQq9gLo)[[Arxiv]](https://arxiv.org/abs/2203.08101)
- [2] **[TOMM'22] |** Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3478642)
- [3] **[TIP'22] |** Geometry Sensitive Cross-Modal Reasoning for Composed Query Based Image Retrieval. [[Paper]](https://ieeexplore.ieee.org/document/9667308)
- [4] **[TIP'22] |** Composed Image Retrieval via Explicit Erasure and Replenishment With Semantic Alignment. [[Paper]](https://ieeexplore.ieee.org/document/9887834)
- [5] **[WACV'22] |** SAC: Semantic Attention Composition for Text-Conditioned Image Retrieval. [[Paper]](https://openaccess.thecvf.com/content/WACV2022/html/Jandial_SAC_Semantic_Attention_Composition_for_Text-Conditioned_Image_Retrieval_WACV_2022_paper.html)
- [6] **[SIGIR'22] |** Progressive Learning for Image Retrieval with Hybrid-Modality Queries. [[Paper]](https://dl.acm.org/doi/10.1145/3477495.3532047)
- [7] **[CVPR'22] |** Effective Conditioned and Composed Image Retrieval Combining CLIP-based Features. [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Baldrati_Effective_Conditioned_and_Composed_Image_Retrieval_Combining_CLIP-Based_Features_CVPR_2022_paper.pdf)
- [8] **[CVPR'22] |** FashionVLP: Vision Language Transformer for Fashion Retrieval with Feedback. [[Paper]](https://ieeexplore.ieee.org/document/9879706)
- [9] **[TMM'22] |** Enhance Composed Image Retrieval via Multi-Level Collaborative Localization and Semantic Activeness Perception. [[Paper]](https://ieeexplore.ieee.org/document/10120671)
- [10] **[TMM'22] |** Adversarial and Isotropic Gradient Augmentation for Image Retrieval With Text Feedback. [[Paper]](https://ieeexplore.ieee.org/abstract/document/9953564/authors#authors)
- [11] **[EMNLP'22] |** FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning. [[Paper]](https://aclanthology.org/2022.emnlp-main.716/)
- [12] **[ECCV'22] |** FashionViL: Fashion-Focused Vision-and-Language Representation Learning. [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136950629.pdf)

### 2021
- [1] **[ICCV'21] |** Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models. [[Paper]](https://ieeexplore.ieee.org/document/9710082/citations#citations)[[Arxiv]](https://arxiv.org/abs/2108.04024)
- [2] **[CVPR'21] |** CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback. [[Paper]](https://ieeexplore.ieee.org/document/9577437)
- [3] **[WACV'21] |** Compositional Learning of Image-Text Query for Image Retrieval. [[Paper]](https://ieeexplore.ieee.org/document/9423122)
- [4] **[MM'21] |** Heterogeneous Feature Fusion and Cross-modal Alignment for Composed Image Retrieval. [[Paper]](https://dl.acm.org/doi/abs/10.1145/3474085.3475659)
- [5] **[MM'21] |** Cross-modal Joint Prediction and Alignment for Composed Query Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3474085.3475483)
- [6] **[MM'21] |** Image Retrieval with Text Feedback by Deep Hierarchical Attention Mutual Information Maximization. [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3474085.3475619)
- [7] **[AAAI'21] |** Dual Compositional Learning in Interactive Image Retrieval. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/16271)
- [8] **[SIGIR'21] |** Comprehensive Linguistic-Visual Composition Network for Image Retrieval. [[Paper]](https://dl.acm.org/doi/abs/10.1145/3404835.3462967)
  
### 2020
- [1] **[CVPR'20] |** Image Search With Text Feedback by Visiolinguistic Attention Learning. [[Paper]](https://ieeexplore.ieee.org/document/9157634)
- [2] **[MM'20] |** Joint Attribute Manipulation and Modality Alignment Learning for Composing Text and Image to Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3394171.3413917)
- [3] **[ECCV'20] |** Learning Joint Visual Semantic Matching Embeddings for Language-Guided Retrieval. [[Paper]](https://link.springer.com/content/pdf/10.1007/978-3-030-58542-6_9.pdf?pdf=inline%20link)
- [4] **[CVPR'20] |** Composed Query Image Retrieval Using Locally Bounded Features. [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9157125)
### 2019
- [1] **[CVPR'19] |** Composing Text and Image for Image Retrieval - an Empirical Odyssey. [[Paper]](https://ieeexplore.ieee.org/document/8953387)[[Arxiv]](https://arxiv.org/abs/1812.07119)

## 3. Few-Shot CIR
### Pre-prints
- [1] **[Arxiv'24] |** Pseudo Triplet Guided Few-shot Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2407.06001)

### 2023
- [1] **[AAAI'23] |** Few-Shot Composition Learning for Image Retrieval with Prompt Tuning. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/25597/25369)

## 4. Zero-Shot CIR
### Pre-prints
- [1] **[Arxiv'25] |** SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval. [[Paper]](https://arxiv.org/pdf/2501.08347v1)  
- [2] **[Arxiv'24] |** MoTaDual: Modality-Task Dual Alignment for Enhanced Zero-shot Composed Image Retrieval. [[Paper]](https://arxiv.org/pdf/2410.23736)
- [3] **[Arxiv'24] |** iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2405.02951)
- [4] **[Arxiv'24] |** Reducing Task Discrepancy of Text Encoders for Zero-Shot Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2406.09188)
- [5] **[Arxiv'24] |** Zero-shot Composed Image Retrieval Considering Query-target Relationship Leveraging Masked Image-text Pairs. [[Paper]](https://arxiv.org/abs/2406.18836)
- [6] **[Arxiv'24] |** Training-free Zero-shot Composed Image Retrieval with Local Concept Re-ranking. [[Paper]](https://arxiv.org/abs/2312.08924)
- [7] **[Arxiv'24] |** HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels. [[Paper]](https://arxiv.org/abs/2407.05795)
- [8] **[Arxiv'24] |** Training-free Zero-shot Composed Image Retrieval via Weighted Modality Fusion and Similarity. [[Paper]](https://arxiv.org/abs/2409.04918)  
- [9] **[Arxiv'24] |** Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy. [[Paper]](https://arxiv.org/pdf/2411.16752)
- [10] **[Arxiv'24] |** Composed Image Retrieval for Training-Free Domain Conversion. [[Paper]](https://arxiv.org/pdf/2412.03297)
- [11] **[Arxiv'24] |** Compositional Image Retrieval via Instruction-Aware Contrastive Learning. [[Paper]](https://arxiv.org/pdf/2412.05756)
- [12] **[Arxiv'24] |** Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval. [[Paper]](https://arxiv.org/pdf/2412.11077)
- [13] **[Arxiv'24] |** MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval. [[Paper]](https://arxiv.org/pdf/2412.14475)
- [14] **[Arxiv'24] |** Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval. [[Paper]](https://arxiv.org/pdf/2410.17393)  
- [15] **[Arxiv'23] |** Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2311.07622)

### 2025   
- [1] **[COLING'25] |** MLLM-I2W: Harnessing Multimodal Large Language Model for  Zero-Shot Composed Image Retrieval. [[Paper]](https://aclanthology.org/2025.coling-main.125.pdf)

### 2024
- [1] **[AAAI'24] |** Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/28324)
- [2] **[ICLR'24] |** Vision-by-Language for Training-Free Compositional Image Retrieval. [[Paper]](https://openreview.net/forum?id=EDPxCjXzSb)
- [3] **[CVPR'24] |** LinCIR: Language-only Training of Zero-shot Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2312.01998)
- [4] **[CVPR'24] |** Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2403.16005)
- [5] **[SIGIR'24] |** Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3626772.3657831)
- [6] **[SIGIR'24] |** LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3626772.3657740)
- [7] **[ICML'24] |** Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning. [[Paper]](https://openreview.net/forum?id=Nm6jYZsBum)
- [8] **[ICML'24] |** MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions. [[Paper]](https://arxiv.org/abs/2403.19651)
- [9] **[TMLR'24] |** CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion. [[Paper]](https://openreview.net/forum?id=mKtlzW0bWc)
- [10] **[ACM MM'24] |**  Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval. [[Paper]](https://openreview.net/pdf?id=rkCYgXfj9P)
- [11] **[ECCV'24] |** Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval. [[Paper]](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02904.pdf)

### 2023
- [1] **[ICCV'23] |** Zero-shot Composed Image Retrieval with Textual Inversion. [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/html/Baldrati_Zero-Shot_Composed_Image_Retrieval_with_Textual_Inversion_ICCV_2023_paper.html)
- [2] **[CVPR'23] |** Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval. [[Paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Saito_Pic2Word_Mapping_Pictures_to_Words_for_Zero-Shot_Composed_Image_Retrieval_CVPR_2023_paper.pdf)
- [3] **[BMVC'23] |** Zero-shot Composed Text-Image Retrieval. [[Paper]](https://proceedings.bmvc2023.org/381/)

## 5. Semi-supervised CIR
### 2024
- [1] **[CVPR'24] |** Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2404.15516)

## 6. Conversational CIR
### Pre-prints
- [1] **[Arxiv'24] |** Leveraging Large Language Models for Multimodal Search. [[Paper]](https://arxiv.org/html/2404.15790v1)

### 2025  
- [1] **[ICLR'25] |** MAI: A Multi-Turn Aggregation-Iteration Model for Composed Image Retrieval. [[Paper]](https://openreview.net/pdf?id=gXyWbl71n1)  
### 2023
- [1] **[ICCV'23] |** FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory. [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/html/Pal_FashionNTM_Multi-turn_Fashion_Image_Retrieval_via_Cascaded_Memory_ICCV_2023_paper.html)
- [2] **[MM'23] |** Conversational Composed Retrieval with Iterative Sequence Refinement. [[Paper]](https://dl.acm.org/doi/10.1145/3581783.3611885)
- [3] **[MMW'23] |** Fashion-GPT: Integrating LLMs with Fashion Retrieval System. [[Paper]](https://dl.acm.org/doi/abs/10.1145/3607827.3616844)

### 2021
- [1] **[SIGIR'21] |** Conversational Fashion Image Retrieval via Multiturn Natural Language Feedback. [[Paper]](https://dl.acm.org/doi/abs/10.1145/3404835.3462881)

### 2018
- [1] **[NeruIPS'18] |** Dialog-based interactive image retrieval. [[Paper]](https://dl.acm.org/doi/10.5555/3326943.3327006)

## 7. COVR
### Pre-prints
- [1] **[Arxiv'24] |** Localizing Events in Videos with Multimodal Queries. [[Paper]](https://arxiv.org/abs/2406.10079)

### 2024
- [1] **[AAAI'24] |** CoVR: Learning Composed Video Retrieval from Web Video Captions. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/28334)
- [2] **[CVPR'24] |** Composed Video Retrieval via Enriched Context and Discriminative Embeddings. [[Paper]](https://arxiv.org/abs/2403.16997)
- [3] **[TPAMI'24] |** CoVR-2: Automatic Data Construction for Composed Video Retrieval. [[Paper]](https://ieeexplore.ieee.org/abstract/document/10685001)
- [4] **[ECCV'24] |** EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval. [[Paper]](https://arxiv.org/abs/2407.16658)

## 8. Sketch-based CIR
### 2024
- [1] **[AAAI'24] |** Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/27956)
- [2] **[CVPR'24] |** You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval. [[Paper]](https://arxiv.org/abs/2403.07222)

## 9. Others
### Person Retrieval
- [1] **[Arxiv'24] |** Word4Per: Zero-shot Composed Person Retrieval. [[Paper]](https://arxiv.org/abs/2311.16515)

### Remote Sensing Retrieval
- [1] **[IGARSS'24] |** Composed Image Retrieval for Remote Sensing. [[Paper]](https://arxiv.org/abs/2405.15587)
- [2] **[TGRS'24] |** Scene Graph-Aware Hierarchical Fusion Network for Remote Sensing Image Retrieval With Text Feedback. [[Paper]](https://ieeexplore.ieee.org/document/10537211)

### Survey  
- [1] **[Arxiv'24] |** A Survey of Multimodal Composite Editing and Retrieval. [[Paper]](https://arxiv.org/abs/2409.05405)

### New Dataset  
- [1] **[Arxiv'24] |** EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections. [[Paper]](https://arxiv.org/pdf/2410.01536v2)
- [2] **[Arxiv'24] |** ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval. [[Paper]](https://arxiv.org/pdf/2410.18715)

## 10. Dataset statistics
| **Dataset**                                      | **Modalities**    | **Images Scale** | **Triplets Scale** | **Type**           | **Link**                    | **Domain** |
| ------------------------------------------------ | ----------------- | ---------------- | ------------------ | ------------------ | --------------------------- | ---------- |
| FashionIQ                       | Image+Text        | ~77.7K           | ~30.1K             | Human Annotated    | [Link](https://github.com/XiaoxiaoGuo/fashion-iq) | Fashion    |
| Shoes                       | Image+Text        | ~14.7K           | ~10.8K             | Human Annotated    | [Link](https://github.com/XiaoxiaoGuo/fashion-retrieval) | Fashion    |
| Fashion200K               | Image+Text        | ~200K            | --                 | --                 | [Link](https://www.kaggle.com/datasets/mayukh18/fashion200k-dataset) | Fashion    |
| MIT                       | Image+Text        | ~6K              | --                 | --                 | [Link](https://web.mit.edu/phillipi/Public/states_and_transformations/index.html) | Open-domain    |
| CIRR                      | Image+Text        | ~21.6K           | ~36.6K             | Human Annotated    | [Link](https://cuberick-orion.github.io/CIRR/) | Open-domain    |
| CIRCO                  | Image+Text        | ~12.3K          | ~1.0K                  | Human Annotated    | [Link](https://github.com/miccunifi/CIRCO/tree/main) | Open-domain    |
| CSS                           | Image+Text        | ~1.0K            | ~32K               | Generated          | [Link](https://github.com/google/tirg) | Open-domain    |
| LaSCo                       | Image+Text        | ~121.5K          | ~389.3K            | Generated          | [Link](https://github.com/levymsn/LaSCo?tab=readme-ov-file#lasco-dataset) | Open-domain    |
| SynthTriplets18M              | Image+Text        | --               | ~18M               | Generated          | [Link](https://github.com/navervision/CompoDiff) | Open-domain    |
| WebVid-CoVR              | Video+Text        | ~130.8K          | ~1.6M              | Generated          | [Link](https://imagine.enpc.fr/~ventural/covr/) | Video      |
| ITCPR                        | Image+Text        | ~20K             | ~12.2K             | Human Annotated    | [Link](https://github.com/Delong-liu-bupt/Word4Per) | Person    |
| Airplane, Tennis, and WHIRT | Image+Text        | ~7.7K            | ~8.7K              | Human Annotated    | - |    Remote Sensing  |
| PATTERNCOM               | Image+Text        | ~30K             | ~21K               | Generated          | [Link](https://github.com/billpsomas/rscir) | Remote Sensing   |
| FS-COCO               | Sketch+Image+Text | ~10K             | ~10K               | Human Annotated    | [Link](https://github.com/pinakinathc/fscoco) | Sketch        |
| SketchyCOCO             | Sketch+Image+Text | ~14K             | ~14K               | Automatic matching | [Link](https://github.com/sysu-imsl/SketchyCOCO) | Sketch |
| CSTBIR                    | Sketch+Image+Text | ~108K            | ~2M                | Automatic matching | [Link](https://vl2g.github.io/projects/cstbir/) |  Sketch       |

Owner

  • Login: BUAADreamer
  • Kind: user
  • Location: Beijing
  • Company: Beihang University

GitHub Events

Total
  • Push event: 2
Last Year
  • Push event: 2