https://github.com/buaadreamer/awesome-composed-image-retrieval
Collection of Composed Image Retrieval (CIR) papers.
https://github.com/buaadreamer/awesome-composed-image-retrieval
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 26 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, sciencedirect.com, springer.com, ieee.org, acm.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.1%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Collection of Composed Image Retrieval (CIR) papers.
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of haokunwen/Awesome-Composed-Image-Retrieval
Created almost 2 years ago
· Last pushed over 1 year ago
https://github.com/BUAADreamer/Awesome-Composed-Image-Retrieval/blob/main/
# Awesome-CIR Collections for the Composed Image Retrieval (CIR), including: [1. Attribute-based CIR](#section1) [2. Supervised CIR](#section2) [3. Few-shot CIR](#section3) [4. Zero-shot CIR](#section4) [5. Semi-supervised CIR](#section5) [6. Conversational CIR](#section6) [7. Composed Video Retrieval (COVR)](#section7) [8. Sketch-based CIR](#section8) [9. Others](#section9) [10. Dataset statistics](#section10) ## 1. Attribute-based CIR ### 2021 - [1] **[ICCV'21] |** Learning Attribute-driven Disentangled Representations for Interactive Fashion Retrieval. [[Paper]](https://ieeexplore.ieee.org/document/9711479) - [2] **[ICCV'21] |** Face Image Retrieval with Attribute Manipulation. [[Paper]](https://ieeexplore.ieee.org/document/9710728) ### 2020 - [1] **[SIGIR'20] |** Generative Attribute Manipulation Scheme for Flexible Fashion Search. [[Paper]](https://dl.acm.org/doi/10.1145/3397271.3401150) ### 2018 - [1] **[CVPR'18] |** Learning Attribute Representations with Localization for Flexible Fashion Search. [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8578902) - [2] **[WACV'18] |** Efficient Multi-Attribute Similarity Learning Towards Attribute-based Fashion Search. [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8354290) ### 2017 - [1] **[CVPR'17] |** Memory-Augmented Attribute Manipulation Networks for Interactive Fashion Search. [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8100135) ## 2. Supervised CIR ### Pre-prints - [1] **[Arxiv'24] |** VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval. [[Paper]](https://arxiv.org/abs/2406.04292) - [2] **[Arxiv'23] |** Relieving Triplet Ambiguity: Consensus Network for Language-Guided Image Retrieval. [[Paper]](https://arxiv.org/abs/2306.02092) - [3] **[Arxiv'23] |** Ranking-aware Uncertainty for Text-guided Image Retrieval. [[Paper]](https://arxiv.org/abs/2308.08131) - [4] **[Arxiv'23] |** Learning with Multi-modal Gradient Attention for Explainable Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2308.16649) - [5] **[Arxiv'23] |** VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering. [[Paper]](https://arxiv.org/abs/2312.12273) ### 2025 - [1] **[AAAI'25] |** Leveraging Large Vision-Language Model as User Intent-aware Encoder for Composed Image Retrieval. [[Paper]](https://arxiv.org/pdf/2412.11087) - [2] **[AAAI'25] |** ENCODER: Entity Mining and Modification Relation Binding for Composed Image Retrieval. ### 2024 - [1] **[WACV'24] |** Bi-directional Training for Composed Image Retrieval via Text Prompt Learning. [[Paper]](https://ieeexplore.ieee.org/abstract/document/10484044) - [2] **[TOMM'24] |** Cross-Modal Attention Preservation with Self-Contrastive Learning for Composed Query-Based Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3639469) - [3] **[TOMM'24] |** SPIRIT: Style-guided Patch Interaction for Fashion Image Retrieval with Text Feedback. [[Paper]](https://dl.acm.org/doi/abs/10.1145/3640345) - [4] **[TPAMI'24] |** Self-Training Boosted Multi-Factor Matching Network for Composed Image Retrieval. [[Paper]](https://ieeexplore.ieee.org/document/10373096) - [5] **[AAAI'24] |** Dynamic Weighted Combiner for Mixed-Modal Image Retrieval. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/28004/28023) - [6] **[AAAI'24] |** Data Roaming and Quality Assessment for Composed Image Retrieval. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/28081/28168) - [7] **[AAAI'24] |** FashionERN Enhance-and-Refine Network for Composed Fashion Image Retrieval. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/download/27885/27795) - [8] **[AAAI'24] |** Decomposing Semantic Shifts for Composed Image Retrieval. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/28479/28933) - [9] **[SIGIR'24] |** Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3626772.3657727) - [10] **[SIGIR'24] |** CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval. [[Paper]](https://export.arxiv.org/abs/2405.19149) - [11] **[CVPR'24] |** SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining. [[Paper]](https://arxiv.org/pdf/2404.01156) - [12] **[ICLR'24] |** Sentence-level Prompts Benefit Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2310.05473) - [13] **[ICLR'24] |** Composed Image Retrieval with Text Feedback via Multi-Grained Uncertainty Regularization. [[Paper]](https://arxiv.org/pdf/2211.07394v5) - [14] **[TMLR'24] |** Candidate Set Re-ranking for Composed Image Retrieval with Dual Multimodal Encoder. [[Paper]](https://arxiv.org/abs/2305.16304) - [15] **[TCSVT'24] |** Set of Diverse Queries with Uncertainty Regularization for Composed Image Retrieval. [[Paper]](https://ieeexplore.ieee.org/abstract/document/10530361) - [16] **[ICMR'24] |** CLIP-ProbCR:CLIP-based Probability embedding Combination Retrieval. [[Paper]](https://dl.acm.org/doi/abs/10.1145/3652583.3657611) - [17] **[TMM'24] |** Align and Retrieve: Composition and Decomposition Learning in Image Retrieval with Text Feedback. [[Paper]](https://ieeexplore.ieee.org/abstract/document/10568424) - [18] **[KBS'24] |** Collaborative Group: Composed Image Retrieval via Consensus Learning From Noisy Annotations. [[Paper]](https://www.sciencedirect.com/science/article/pii/S095070512400769X?via%3Dihub) - [19] **[TIP'24] |** Multimodal Composition Example Mining for Composed Query Image Retrieval. [[Paper]](https://ieeexplore.ieee.org/abstract/document/10418785) - [20] **[TOIS'24] |** LLM-enhanced Composed Image Retrieval: An Intent Uncertainty-aware Linguistic-Visual Dual Channel Matching Model. [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3699715) - [21] **[ACM MM'24] |** Semantic Distillation from Neighborhood for Composed Image Retrieval. [[Paper]](https://openreview.net/pdf?id=MAgFiw3yHG) - [22] **[ACM MM'24] |** Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives. [[Paper]](dl.acm.org/doi/10.1145/3664647.3680808) - [23] **[NeurIPS'24] |** Easy Regional Contrastive Learning of Expressive Fashion Representations. [[Paper]](https://openreview.net/forum?id=bCL9U2X9Jg) - [24] **[ACL'24] |** UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation. [[Paper]](https://arxiv.org/pdf/2408.11305) ### 2023 - [1] **[TOMM'23] |** AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval. [[Paper]](https://dl.acm.org/doi/abs/10.1145/3584703) - [2] **[TMM'23] |** Multi-Modal Transformer With Global-Local Alignment for Composed Query Image Retrieval. [[Paper]](https://ieeexplore.ieee.org/document/10012544) - [3] **[WACV'23] |** Fashion Image Retrieval with Text Feedback by Additive Attention Compositional Learning. [[Paper]](https://ieeexplore.ieee.org/document/10030891) - [4] **[ICMR'23] |** Dual-Path Semantic Construction Network for Composed Query-Based Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3591106.3592245) - [5] **[TCSVT'23] |** Multi-Grained Attention Network With Mutual Exclusion for Composed Query-Based Image Retrieval. [[Paper]](https://ieeexplore.ieee.org/abstract/document/10225420) - [6] **[TOMM'23] |** Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features. [[Paper]](https://dl.acm.org/doi/10.1145/3617597) - [7] **[ICME'23] |** Visual-Linguistic Alignment and Composition for Image Retrieval with Text Feedback. [[Paper]](https://ieeexplore.ieee.org/document/10219821) - [8] **[TIP'23] |** Composed Image Retrieval via Cross Relation Network With Hierarchical Aggregation Transformer. [[Paper]](https://ieeexplore.ieee.org/document/10205526) - [9] **[MM'23] |** Target-Guided Composed Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3581783.3611817) - [10] **[CVPR'23] |** FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks. [[Paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Han_FAME-ViL_Multi-Tasking_Vision-Language_Model_for_Heterogeneous_Fashion_Tasks_CVPR_2023_paper.pdf) - [11] **[ICCVW'23] |** ProVLA: Compositional Image Search with Progressive Vision-Language Alignment and Multimodal Fusion. [[Paper]](https://ieeexplore.ieee.org/document/10350916) - [12] **[NeurIPSW'23] |** NEUCORE: Neural Concept Reasoning for Composed Image Retrieval. [[Paper]](https://arxiv.org/pdf/2310.01358) - [13] **[NeurIPSW'23] |** Benchmarking Robustness of Text-Image Composed Retrieval. [[Paper]](https://suntongtongtong.github.io/benchmarking_robustness.pdf) - [13] **[MMW'23] |** Fashion-GPT: Integrating LLMs with Fashion Retrieval System. [[Paper]](https://dl.acm.org/doi/10.1145/3607827.3616844) ### 2022 - [1] **[ICLR'22] |** ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity. [[Paper]](https://openreview.net/forum?id=CVfLvQq9gLo)[[Arxiv]](https://arxiv.org/abs/2203.08101) - [2] **[TOMM'22] |** Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3478642) - [3] **[TIP'22] |** Geometry Sensitive Cross-Modal Reasoning for Composed Query Based Image Retrieval. [[Paper]](https://ieeexplore.ieee.org/document/9667308) - [4] **[TIP'22] |** Composed Image Retrieval via Explicit Erasure and Replenishment With Semantic Alignment. [[Paper]](https://ieeexplore.ieee.org/document/9887834) - [5] **[WACV'22] |** SAC: Semantic Attention Composition for Text-Conditioned Image Retrieval. [[Paper]](https://openaccess.thecvf.com/content/WACV2022/html/Jandial_SAC_Semantic_Attention_Composition_for_Text-Conditioned_Image_Retrieval_WACV_2022_paper.html) - [6] **[SIGIR'22] |** Progressive Learning for Image Retrieval with Hybrid-Modality Queries. [[Paper]](https://dl.acm.org/doi/10.1145/3477495.3532047) - [7] **[CVPR'22] |** Effective Conditioned and Composed Image Retrieval Combining CLIP-based Features. [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Baldrati_Effective_Conditioned_and_Composed_Image_Retrieval_Combining_CLIP-Based_Features_CVPR_2022_paper.pdf) - [8] **[CVPR'22] |** FashionVLP: Vision Language Transformer for Fashion Retrieval with Feedback. [[Paper]](https://ieeexplore.ieee.org/document/9879706) - [9] **[TMM'22] |** Enhance Composed Image Retrieval via Multi-Level Collaborative Localization and Semantic Activeness Perception. [[Paper]](https://ieeexplore.ieee.org/document/10120671) - [10] **[TMM'22] |** Adversarial and Isotropic Gradient Augmentation for Image Retrieval With Text Feedback. [[Paper]](https://ieeexplore.ieee.org/abstract/document/9953564/authors#authors) - [11] **[EMNLP'22] |** FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning. [[Paper]](https://aclanthology.org/2022.emnlp-main.716/) - [12] **[ECCV'22] |** FashionViL: Fashion-Focused Vision-and-Language Representation Learning. [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136950629.pdf) ### 2021 - [1] **[ICCV'21] |** Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models. [[Paper]](https://ieeexplore.ieee.org/document/9710082/citations#citations)[[Arxiv]](https://arxiv.org/abs/2108.04024) - [2] **[CVPR'21] |** CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback. [[Paper]](https://ieeexplore.ieee.org/document/9577437) - [3] **[WACV'21] |** Compositional Learning of Image-Text Query for Image Retrieval. [[Paper]](https://ieeexplore.ieee.org/document/9423122) - [4] **[MM'21] |** Heterogeneous Feature Fusion and Cross-modal Alignment for Composed Image Retrieval. [[Paper]](https://dl.acm.org/doi/abs/10.1145/3474085.3475659) - [5] **[MM'21] |** Cross-modal Joint Prediction and Alignment for Composed Query Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3474085.3475483) - [6] **[MM'21] |** Image Retrieval with Text Feedback by Deep Hierarchical Attention Mutual Information Maximization. [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3474085.3475619) - [7] **[AAAI'21] |** Dual Compositional Learning in Interactive Image Retrieval. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/16271) - [8] **[SIGIR'21] |** Comprehensive Linguistic-Visual Composition Network for Image Retrieval. [[Paper]](https://dl.acm.org/doi/abs/10.1145/3404835.3462967) ### 2020 - [1] **[CVPR'20] |** Image Search With Text Feedback by Visiolinguistic Attention Learning. [[Paper]](https://ieeexplore.ieee.org/document/9157634) - [2] **[MM'20] |** Joint Attribute Manipulation and Modality Alignment Learning for Composing Text and Image to Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3394171.3413917) - [3] **[ECCV'20] |** Learning Joint Visual Semantic Matching Embeddings for Language-Guided Retrieval. [[Paper]](https://link.springer.com/content/pdf/10.1007/978-3-030-58542-6_9.pdf?pdf=inline%20link) - [4] **[CVPR'20] |** Composed Query Image Retrieval Using Locally Bounded Features. [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9157125) ### 2019 - [1] **[CVPR'19] |** Composing Text and Image for Image Retrieval - an Empirical Odyssey. [[Paper]](https://ieeexplore.ieee.org/document/8953387)[[Arxiv]](https://arxiv.org/abs/1812.07119) ## 3. Few-Shot CIR ### Pre-prints - [1] **[Arxiv'24] |** Pseudo Triplet Guided Few-shot Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2407.06001) ### 2023 - [1] **[AAAI'23] |** Few-Shot Composition Learning for Image Retrieval with Prompt Tuning. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/25597/25369) ## 4. Zero-Shot CIR ### Pre-prints - [1] **[Arxiv'25] |** SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval. [[Paper]](https://arxiv.org/pdf/2501.08347v1) - [2] **[Arxiv'24] |** MoTaDual: Modality-Task Dual Alignment for Enhanced Zero-shot Composed Image Retrieval. [[Paper]](https://arxiv.org/pdf/2410.23736) - [3] **[Arxiv'24] |** iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2405.02951) - [4] **[Arxiv'24] |** Reducing Task Discrepancy of Text Encoders for Zero-Shot Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2406.09188) - [5] **[Arxiv'24] |** Zero-shot Composed Image Retrieval Considering Query-target Relationship Leveraging Masked Image-text Pairs. [[Paper]](https://arxiv.org/abs/2406.18836) - [6] **[Arxiv'24] |** Training-free Zero-shot Composed Image Retrieval with Local Concept Re-ranking. [[Paper]](https://arxiv.org/abs/2312.08924) - [7] **[Arxiv'24] |** HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels. [[Paper]](https://arxiv.org/abs/2407.05795) - [8] **[Arxiv'24] |** Training-free Zero-shot Composed Image Retrieval via Weighted Modality Fusion and Similarity. [[Paper]](https://arxiv.org/abs/2409.04918) - [9] **[Arxiv'24] |** Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy. [[Paper]](https://arxiv.org/pdf/2411.16752) - [10] **[Arxiv'24] |** Composed Image Retrieval for Training-Free Domain Conversion. [[Paper]](https://arxiv.org/pdf/2412.03297) - [11] **[Arxiv'24] |** Compositional Image Retrieval via Instruction-Aware Contrastive Learning. [[Paper]](https://arxiv.org/pdf/2412.05756) - [12] **[Arxiv'24] |** Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval. [[Paper]](https://arxiv.org/pdf/2412.11077) - [13] **[Arxiv'24] |** MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval. [[Paper]](https://arxiv.org/pdf/2412.14475) - [14] **[Arxiv'24] |** Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval. [[Paper]](https://arxiv.org/pdf/2410.17393) - [15] **[Arxiv'23] |** Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2311.07622) ### 2025 - [1] **[COLING'25] |** MLLM-I2W: Harnessing Multimodal Large Language Model for Zero-Shot Composed Image Retrieval. [[Paper]](https://aclanthology.org/2025.coling-main.125.pdf) ### 2024 - [1] **[AAAI'24] |** Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/28324) - [2] **[ICLR'24] |** Vision-by-Language for Training-Free Compositional Image Retrieval. [[Paper]](https://openreview.net/forum?id=EDPxCjXzSb) - [3] **[CVPR'24] |** LinCIR: Language-only Training of Zero-shot Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2312.01998) - [4] **[CVPR'24] |** Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2403.16005) - [5] **[SIGIR'24] |** Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3626772.3657831) - [6] **[SIGIR'24] |** LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval. [[Paper]](https://dl.acm.org/doi/10.1145/3626772.3657740) - [7] **[ICML'24] |** Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning. [[Paper]](https://openreview.net/forum?id=Nm6jYZsBum) - [8] **[ICML'24] |** MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions. [[Paper]](https://arxiv.org/abs/2403.19651) - [9] **[TMLR'24] |** CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion. [[Paper]](https://openreview.net/forum?id=mKtlzW0bWc) - [10] **[ACM MM'24] |** Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval. [[Paper]](https://openreview.net/pdf?id=rkCYgXfj9P) - [11] **[ECCV'24] |** Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval. [[Paper]](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02904.pdf) ### 2023 - [1] **[ICCV'23] |** Zero-shot Composed Image Retrieval with Textual Inversion. [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/html/Baldrati_Zero-Shot_Composed_Image_Retrieval_with_Textual_Inversion_ICCV_2023_paper.html) - [2] **[CVPR'23] |** Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval. [[Paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Saito_Pic2Word_Mapping_Pictures_to_Words_for_Zero-Shot_Composed_Image_Retrieval_CVPR_2023_paper.pdf) - [3] **[BMVC'23] |** Zero-shot Composed Text-Image Retrieval. [[Paper]](https://proceedings.bmvc2023.org/381/) ## 5. Semi-supervised CIR ### 2024 - [1] **[CVPR'24] |** Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval. [[Paper]](https://arxiv.org/abs/2404.15516) ## 6. Conversational CIR ### Pre-prints - [1] **[Arxiv'24] |** Leveraging Large Language Models for Multimodal Search. [[Paper]](https://arxiv.org/html/2404.15790v1) ### 2025 - [1] **[ICLR'25] |** MAI: A Multi-Turn Aggregation-Iteration Model for Composed Image Retrieval. [[Paper]](https://openreview.net/pdf?id=gXyWbl71n1) ### 2023 - [1] **[ICCV'23] |** FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory. [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/html/Pal_FashionNTM_Multi-turn_Fashion_Image_Retrieval_via_Cascaded_Memory_ICCV_2023_paper.html) - [2] **[MM'23] |** Conversational Composed Retrieval with Iterative Sequence Refinement. [[Paper]](https://dl.acm.org/doi/10.1145/3581783.3611885) - [3] **[MMW'23] |** Fashion-GPT: Integrating LLMs with Fashion Retrieval System. [[Paper]](https://dl.acm.org/doi/abs/10.1145/3607827.3616844) ### 2021 - [1] **[SIGIR'21] |** Conversational Fashion Image Retrieval via Multiturn Natural Language Feedback. [[Paper]](https://dl.acm.org/doi/abs/10.1145/3404835.3462881) ### 2018 - [1] **[NeruIPS'18] |** Dialog-based interactive image retrieval. [[Paper]](https://dl.acm.org/doi/10.5555/3326943.3327006) ## 7. COVR ### Pre-prints - [1] **[Arxiv'24] |** Localizing Events in Videos with Multimodal Queries. [[Paper]](https://arxiv.org/abs/2406.10079) ### 2024 - [1] **[AAAI'24] |** CoVR: Learning Composed Video Retrieval from Web Video Captions. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/28334) - [2] **[CVPR'24] |** Composed Video Retrieval via Enriched Context and Discriminative Embeddings. [[Paper]](https://arxiv.org/abs/2403.16997) - [3] **[TPAMI'24] |** CoVR-2: Automatic Data Construction for Composed Video Retrieval. [[Paper]](https://ieeexplore.ieee.org/abstract/document/10685001) - [4] **[ECCV'24] |** EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval. [[Paper]](https://arxiv.org/abs/2407.16658) ## 8. Sketch-based CIR ### 2024 - [1] **[AAAI'24] |** Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions. [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/27956) - [2] **[CVPR'24] |** You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval. [[Paper]](https://arxiv.org/abs/2403.07222) ## 9. Others ### Person Retrieval - [1] **[Arxiv'24] |** Word4Per: Zero-shot Composed Person Retrieval. [[Paper]](https://arxiv.org/abs/2311.16515) ### Remote Sensing Retrieval - [1] **[IGARSS'24] |** Composed Image Retrieval for Remote Sensing. [[Paper]](https://arxiv.org/abs/2405.15587) - [2] **[TGRS'24] |** Scene Graph-Aware Hierarchical Fusion Network for Remote Sensing Image Retrieval With Text Feedback. [[Paper]](https://ieeexplore.ieee.org/document/10537211) ### Survey - [1] **[Arxiv'24] |** A Survey of Multimodal Composite Editing and Retrieval. [[Paper]](https://arxiv.org/abs/2409.05405) ### New Dataset - [1] **[Arxiv'24] |** EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections. [[Paper]](https://arxiv.org/pdf/2410.01536v2) - [2] **[Arxiv'24] |** ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval. [[Paper]](https://arxiv.org/pdf/2410.18715) ## 10. Dataset statistics | **Dataset** | **Modalities** | **Images Scale** | **Triplets Scale** | **Type** | **Link** | **Domain** | | ------------------------------------------------ | ----------------- | ---------------- | ------------------ | ------------------ | --------------------------- | ---------- | | FashionIQ | Image+Text | ~77.7K | ~30.1K | Human Annotated | [Link](https://github.com/XiaoxiaoGuo/fashion-iq) | Fashion | | Shoes | Image+Text | ~14.7K | ~10.8K | Human Annotated | [Link](https://github.com/XiaoxiaoGuo/fashion-retrieval) | Fashion | | Fashion200K | Image+Text | ~200K | -- | -- | [Link](https://www.kaggle.com/datasets/mayukh18/fashion200k-dataset) | Fashion | | MIT | Image+Text | ~6K | -- | -- | [Link](https://web.mit.edu/phillipi/Public/states_and_transformations/index.html) | Open-domain | | CIRR | Image+Text | ~21.6K | ~36.6K | Human Annotated | [Link](https://cuberick-orion.github.io/CIRR/) | Open-domain | | CIRCO | Image+Text | ~12.3K | ~1.0K | Human Annotated | [Link](https://github.com/miccunifi/CIRCO/tree/main) | Open-domain | | CSS | Image+Text | ~1.0K | ~32K | Generated | [Link](https://github.com/google/tirg) | Open-domain | | LaSCo | Image+Text | ~121.5K | ~389.3K | Generated | [Link](https://github.com/levymsn/LaSCo?tab=readme-ov-file#lasco-dataset) | Open-domain | | SynthTriplets18M | Image+Text | -- | ~18M | Generated | [Link](https://github.com/navervision/CompoDiff) | Open-domain | | WebVid-CoVR | Video+Text | ~130.8K | ~1.6M | Generated | [Link](https://imagine.enpc.fr/~ventural/covr/) | Video | | ITCPR | Image+Text | ~20K | ~12.2K | Human Annotated | [Link](https://github.com/Delong-liu-bupt/Word4Per) | Person | | Airplane, Tennis, and WHIRT | Image+Text | ~7.7K | ~8.7K | Human Annotated | - | Remote Sensing | | PATTERNCOM | Image+Text | ~30K | ~21K | Generated | [Link](https://github.com/billpsomas/rscir) | Remote Sensing | | FS-COCO | Sketch+Image+Text | ~10K | ~10K | Human Annotated | [Link](https://github.com/pinakinathc/fscoco) | Sketch | | SketchyCOCO | Sketch+Image+Text | ~14K | ~14K | Automatic matching | [Link](https://github.com/sysu-imsl/SketchyCOCO) | Sketch | | CSTBIR | Sketch+Image+Text | ~108K | ~2M | Automatic matching | [Link](https://vl2g.github.io/projects/cstbir/) | Sketch |
Owner
- Login: BUAADreamer
- Kind: user
- Location: Beijing
- Company: Beihang University
- Website: https://buaadreamer.top/
- Repositories: 4
- Profile: https://github.com/BUAADreamer
GitHub Events
Total
- Push event: 2
Last Year
- Push event: 2