https://github.com/924973292/idea
【CVPR2025】IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org, scholar.google -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary
Keywords
Repository
【CVPR2025】IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
Basic Info
Statistics
- Stars: 17
- Watchers: 2
- Forks: 3
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-Modal Object Re-Identification
Yuhao Wang · Yongfeng Lv · Pingping Zhang* · Huchuan Lu
Figure 1: Motivation of IDEA.
Figure 2: Overall Framework of IDEA.
Abstract 📝
IDEA 🚀 is a novel multi-modal object Re-Identification (ReID) framework that leverages inverted text and cooperative deformable aggregation to address the challenges of complex scenarios in multi-modal imaging. By integrating semantic guidance from text annotations and adaptively aggregating discriminative local features, IDEA achieves state-of-the-art performance on multiple benchmarks.
News 📢
- We released the IDEA codebase!
- Great news! Our paper has been accepted to CVPR 2025! 🏆
Table of Contents 📑
Introduction 🌟
Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary information from various modalities. However, existing methods often focus solely on fusing visual features while neglecting the potential benefits of text-based semantic information.
To address this issue, we propose IDEA, a novel feature learning framework comprising: 1. Inverted Multi-modal Feature Extractor (IMFE): Integrates multi-modal features using Modal Prefixes and an InverseNet. 2. Cooperative Deformable Aggregation (CDA): Adaptively aggregates discriminative local information by generating sampling positions.
Additionally, we construct three text-enhanced multi-modal object ReID benchmarks using a standardized pipeline for structured and concise text annotations with Multi-modal Large Language Models (MLLMs). 📝
Contributions ✨
- Constructed three text-enhanced multi-modal object ReID benchmarks, providing a structured caption generation pipeline across multiple spectral modalities.
- Introduced IDEA, a novel feature learning framework with two key components:
- IMFE: Integrates multi-modal features using Modal Prefixes and an InverseNet.
- CDA: Adaptively aggregates discriminative local information.
- Validated the effectiveness of our approach through extensive experiments on three benchmark datasets.
Quick View 📊
Dataset Examples
Overview of Annotations
Multi-modal Person ReID Annotations Example
Multi-modal Vehicle ReID Annotations Example
Experimental Results
Multi-Modal Person ReID
Multi-Modal Vehicle ReID
Parameter Analysis
Visualizations 🖼️
Offsets Visualization
Cosine Similarity Visualization
Semantic Guidance Visualization
Rank-list Visualization
Multi-modal Person ReID
Multi-modal Vehicle ReID
Quick Start 🚀
Datasets
- RGBNT201: Google Drive
- RGBNT100: Baidu Pan (Code:
rjin) - MSVR310: Google Drive
- Annotations: QwenVL_Anno
Codebase Structure
IDEA_Codes
├── PTH # Pre-trained models
│ └── ViT-B-16.pt # CLIP model
├── DATA # Dataset root directory
│ ├── RGBNT201 # RGBNT201 dataset
│ │ ├── train_171 # Training images (171 classes)
│ │ ├── test # Testing images
│ │ ├── text # Annotations
│ │ │ ├── train_RGB.json # Training annotations
│ │ │ ├── test_RGB.json # Testing annotations
│ │ │ └── ... # Other annotations
│ ├── RGBNT100 # RGBNT100 dataset
│ └── MSVR310 # MSVR310 dataset
├── assets # Github assets
├── config # Configuration files
├── QwenVL_Anno # **YOU SHOULD PUT YOUR ANNOTATIONS TO THE DATA FOLDER**
└── ... # Other project files
Pretrained Models
- CLIP: Baidu Pan (Code:
52fu)
Configuration
- RGBNT201:
configs/RGBNT201/IDEA.yml - RGBNT100:
configs/RGBNT100/IDEA.yml - MSVR310:
configs/MSVR310/IDEA.yml
Training
bash
conda create -n IDEA python=3.10.13
conda activate IDEA
pip install torch==2.1.1+cu118 torchvision==0.16.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118
cd ../IDEA_PUBLIC
pip install --upgrade pip
pip install -r requirements.txt
python train.py --config_file ./configs/RGBNT201/IDEA.yml
Training Example
Poster 📜
Star History 🌟
Citation 📚
If you find IDEA helpful in your research, please consider citing:
bibtex
@inproceedings{wang2025idea,
title={IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-Modal Object Re-Identification},
author={Wang, Yuhao and Lv, Yongfeng and Zhang, Pingping and Lu, Huchuan},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}
Owner
- Name: Yuhao Wang
- Login: 924973292
- Kind: user
- Location: Dalian
- Company: Dalian University of Technology
- Repositories: 7
- Profile: https://github.com/924973292
生如芥子,心藏须弥
GitHub Events
Total
- Issues event: 4
- Watch event: 34
- Issue comment event: 8
- Push event: 18
- Fork event: 7
- Create event: 2
Last Year
- Issues event: 4
- Watch event: 34
- Issue comment event: 8
- Push event: 18
- Fork event: 7
- Create event: 2
Dependencies
- Jinja2 ==3.1.4
- Markdown ==3.7
- MarkupSafe ==2.1.5
- PyYAML ==6.0.2
- Werkzeug ==3.1.3
- absl-py ==2.1.0
- certifi ==2022.12.7
- charset-normalizer ==2.1.1
- contourpy ==1.3.1
- cycler ==0.12.1
- einops ==0.7.0
- exceptiongroup ==1.2.2
- filelock ==3.13.1
- fonttools ==4.56.0
- fsspec ==2024.6.1
- ftfy ==6.2.3
- fvcore ==0.1.5.post20221221
- grpcio ==1.70.0
- huggingface-hub ==0.21.4
- idna ==3.4
- iniconfig ==2.0.0
- iopath ==0.1.10
- joblib ==1.4.2
- kiwisolver ==1.4.8
- matplotlib ==3.8.3
- mpmath ==1.3.0
- networkx ==3.3
- numpy ==1.26.3
- opencv-python ==4.9.0.80
- packaging ==24.0
- pandas ==2.2.3
- pillow ==10.2.0
- pluggy ==1.5.0
- portalocker ==3.1.1
- protobuf ==6.30.0
- pyparsing ==3.2.1
- pytest ==8.1.1
- python-dateutil ==2.9.0.post0
- pytz ==2025.1
- regex ==2023.12.25
- requests ==2.28.1
- safetensors ==0.5.3
- scikit-learn ==1.5.1
- scipy ==1.12.0
- seaborn ==0.13.2
- six ==1.17.0
- sympy ==1.13.1
- tabulate ==0.9.0
- tensorboard ==2.19.0
- tensorboard-data-server ==0.7.2
- tensorboardX ==2.6.2.2
- termcolor ==2.5.0
- threadpoolctl ==3.5.0
- timm ==0.4.12
- tokenizers ==0.15.2
- tomli ==2.2.1
- tqdm ==4.66.2
- transformers ==4.38.2
- triton ==2.1.0
- typing_extensions ==4.12.2
- tzdata ==2025.1
- urllib3 ==1.26.13
- wcwidth ==0.2.13
- yacs ==0.1.8