https://github.com/924973292/idea

【CVPR2025】IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, scholar.google
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary

Keywords

caption mllms multi-modal multi-modal-learning reid thermal-imaging

Last synced: 5 months ago · JSON representation

Repository

【CVPR2025】IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification

Basic Info

Host: GitHub
Owner: 924973292
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 34.9 MB

Statistics

Stars: 17
Watchers: 2
Forks: 3
Open Issues: 0
Releases: 0

Topics

caption mllms multi-modal multi-modal-learning reid thermal-imaging

Created 12 months ago · Last pushed 11 months ago

Metadata Files

Readme License

IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-Modal Object Re-Identification

Yuhao Wang · Yongfeng Lv · Pingping Zhang* · Huchuan Lu

CVPR 2025 Paper

Description of the image

Figure 1: Motivation of IDEA.

RGBNT201 assets

Figure 2: Overall Framework of IDEA.

Abstract 📝

IDEA 🚀 is a novel multi-modal object Re-Identification (ReID) framework that leverages inverted text and cooperative deformable aggregation to address the challenges of complex scenarios in multi-modal imaging. By integrating semantic guidance from text annotations and adaptively aggregating discriminative local features, IDEA achieves state-of-the-art performance on multiple benchmarks.

News 📢

We released the IDEA codebase!
Great news! Our paper has been accepted to CVPR 2025! 🏆

Introduction 🌟

Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary information from various modalities. However, existing methods often focus solely on fusing visual features while neglecting the potential benefits of text-based semantic information.

To address this issue, we propose IDEA, a novel feature learning framework comprising: 1. Inverted Multi-modal Feature Extractor (IMFE): Integrates multi-modal features using Modal Prefixes and an InverseNet. 2. Cooperative Deformable Aggregation (CDA): Adaptively aggregates discriminative local information by generating sampling positions.

Additionally, we construct three text-enhanced multi-modal object ReID benchmarks using a standardized pipeline for structured and concise text annotations with Multi-modal Large Language Models (MLLMs). 📝

Contributions ✨

Constructed three text-enhanced multi-modal object ReID benchmarks, providing a structured caption generation pipeline across multiple spectral modalities.
Introduced IDEA, a novel feature learning framework with two key components:
- IMFE: Integrates multi-modal features using Modal Prefixes and an InverseNet.
- CDA: Adaptively aggregates discriminative local information.
Validated the effectiveness of our approach through extensive experiments on three benchmark datasets.

Quick View 📊

Dataset Examples

Overview of Annotations

Dataset Overview

Multi-modal Person ReID Annotations Example

Person ReID Annotations

Multi-modal Vehicle ReID Annotations Example

Vehicle ReID Annotations

Experimental Results

Multi-Modal Person ReID

RGBNT201 assets

Multi-Modal Vehicle ReID

RGBNT100 assets

Parameter Analysis

Params

Visualizations 🖼️

Offsets Visualization

Offsets

Cosine Similarity Visualization

Cosine Similarity

Semantic Guidance Visualization

Semantic Guidance

Rank-list Visualization

Multi-modal Person ReID

Rank-list

Multi-modal Vehicle ReID

Rank-list

Quick Start 🚀

Datasets

RGBNT201: Google Drive
RGBNT100: Baidu Pan (Code: rjin)
MSVR310: Google Drive
Annotations: QwenVL_Anno

Codebase Structure

IDEA_Codes ├── PTH # Pre-trained models │ └── ViT-B-16.pt # CLIP model ├── DATA # Dataset root directory │ ├── RGBNT201 # RGBNT201 dataset │ │ ├── train_171 # Training images (171 classes) │ │ ├── test # Testing images │ │ ├── text # Annotations │ │ │ ├── train_RGB.json # Training annotations │ │ │ ├── test_RGB.json # Testing annotations │ │ │ └── ... # Other annotations │ ├── RGBNT100 # RGBNT100 dataset │ └── MSVR310 # MSVR310 dataset ├── assets # Github assets ├── config # Configuration files ├── QwenVL_Anno # **YOU SHOULD PUT YOUR ANNOTATIONS TO THE DATA FOLDER** └── ... # Other project files

Pretrained Models

CLIP: Baidu Pan (Code: 52fu)

Configuration

RGBNT201: configs/RGBNT201/IDEA.yml
RGBNT100: configs/RGBNT100/IDEA.yml
MSVR310: configs/MSVR310/IDEA.yml

Training

bash conda create -n IDEA python=3.10.13 conda activate IDEA pip install torch==2.1.1+cu118 torchvision==0.16.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118 cd ../IDEA_PUBLIC pip install --upgrade pip pip install -r requirements.txt python train.py --config_file ./configs/RGBNT201/IDEA.yml

Training Example

RGBNT201: LOGFILE / WEIGHT
CODE: g6om

Poster 📜

Poster

Star History 🌟

Citation 📚

If you find IDEA helpful in your research, please consider citing: bibtex @inproceedings{wang2025idea, title={IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-Modal Object Re-Identification}, author={Wang, Yuhao and Lv, Yongfeng and Zhang, Pingping and Lu, Huchuan}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2025} }

Owner

Name: Yuhao Wang
Login: 924973292
Kind: user
Location: Dalian
Company: Dalian University of Technology

Repositories: 7
Profile: https://github.com/924973292

生如芥子，心藏须弥

GitHub Events

Total

Issues event: 4
Watch event: 34
Issue comment event: 8
Push event: 18
Fork event: 7
Create event: 2

Last Year

Issues event: 4
Watch event: 34
Issue comment event: 8
Push event: 18
Fork event: 7
Create event: 2

Dependencies

requirements.txt pypi

Jinja2 ==3.1.4
Markdown ==3.7
MarkupSafe ==2.1.5
PyYAML ==6.0.2
Werkzeug ==3.1.3
absl-py ==2.1.0
certifi ==2022.12.7
charset-normalizer ==2.1.1
contourpy ==1.3.1
cycler ==0.12.1
einops ==0.7.0
exceptiongroup ==1.2.2
filelock ==3.13.1
fonttools ==4.56.0
fsspec ==2024.6.1
ftfy ==6.2.3
fvcore ==0.1.5.post20221221
grpcio ==1.70.0
huggingface-hub ==0.21.4
idna ==3.4
iniconfig ==2.0.0
iopath ==0.1.10
joblib ==1.4.2
kiwisolver ==1.4.8
matplotlib ==3.8.3
mpmath ==1.3.0
networkx ==3.3
numpy ==1.26.3
opencv-python ==4.9.0.80
packaging ==24.0
pandas ==2.2.3
pillow ==10.2.0
pluggy ==1.5.0
portalocker ==3.1.1
protobuf ==6.30.0
pyparsing ==3.2.1
pytest ==8.1.1
python-dateutil ==2.9.0.post0
pytz ==2025.1
regex ==2023.12.25
requests ==2.28.1
safetensors ==0.5.3
scikit-learn ==1.5.1
scipy ==1.12.0
seaborn ==0.13.2
six ==1.17.0
sympy ==1.13.1
tabulate ==0.9.0
tensorboard ==2.19.0
tensorboard-data-server ==0.7.2
tensorboardX ==2.6.2.2
termcolor ==2.5.0
threadpoolctl ==3.5.0
timm ==0.4.12
tokenizers ==0.15.2
tomli ==2.2.1
tqdm ==4.66.2
transformers ==4.38.2
triton ==2.1.0
typing_extensions ==4.12.2
tzdata ==2025.1
urllib3 ==1.26.13
wcwidth ==0.2.13
yacs ==0.1.8

https://github.com/924973292/idea

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-Modal Object Re-Identification

Abstract 📝

News 📢

Table of Contents 📑

Introduction 🌟

Contributions ✨

Quick View 📊

Dataset Examples

Overview of Annotations

Multi-modal Person ReID Annotations Example

Multi-modal Vehicle ReID Annotations Example

Experimental Results

Multi-Modal Person ReID

Multi-Modal Vehicle ReID

Parameter Analysis

Visualizations 🖼️

Offsets Visualization

Cosine Similarity Visualization

Semantic Guidance Visualization

Rank-list Visualization

Multi-modal Person ReID

Multi-modal Vehicle ReID

Quick Start 🚀

Datasets

Codebase Structure

Pretrained Models

Configuration

Training

Training Example

Poster 📜

Star History 🌟

Citation 📚

Owner

GitHub Events

Total

Last Year

Dependencies