https://github.com/924973292/editor

【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

https://github.com/924973292/editor

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, scholar.google
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.8%) to scientific vocabulary

Keywords

cvpr2024 frequency-analysis msvr310 multi-modal multi-modal-learning person-reid reid rgbnt100 rgbnt201 token-selection vehicle-reidentification
Last synced: 6 months ago · JSON representation

Repository

【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Basic Info
  • Host: GitHub
  • Owner: 924973292
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 10.6 MB
Statistics
  • Stars: 77
  • Watchers: 2
  • Forks: 5
  • Open Issues: 1
  • Releases: 0
Topics
cvpr2024 frequency-analysis msvr310 multi-modal multi-modal-learning person-reid reid rgbnt100 rgbnt201 token-selection vehicle-reidentification
Created almost 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Description of the image

Pingping Zhang* · Yuhao Wang · Yang Liu · Zhengzheng Tu · Huchuan Lu

CVPR 2024 Paper

CVPR2024_poster

Previous methods may be easily affected by irrelevant backgrounds and usually ignore the modality gaps. To address above issues, we propose a novel learning framework named EDITOR to sElect DIverse Tokens for multi-modal Object ReID. EDITOR prioritizes the selection of object-centric information, aiming to preserve the diverse features of different modalities while minimizing background interference. Our proposed EDITOR achieves competitive performance on three multi-modal object ReID benchmarks, i.e., RGBNT201, RGBNT100 and MSVR310.

News

Exciting news! Our paper has been accepted by the CVPR 2024! 🎉

Table of Contents

Introduction

Multi-modal object ReID is crucial in scenarios where objects are captured through different image spectra, such as RGB, near-infrared, and thermal imaging. Previous multi-modal ReID methods typically adhere to the approach of extracting global features from all regions of images in different modalities and subsequently aggregating them. Nevertheless, these methods present two key limitations: (1) Within individual modalities, backgrounds introduce additional noise, especially in challenging visual environments. (2) Across different modalities, backgrounds introduce overhead in reducing modality gaps, which may amplify the difficulty in aggregating features. Hence, our method prioritizes the selection of object-centric information, aiming to preserve the diverse features of different modalities while minimizing background interference.

Contributions

  • We introduce EDITOR, a novel learning framework for multi-modal object ReID. To our best knowledge, EDITOR represents the first attempt to enhance multi-modal object ReID through object-centric token selection.
  • We propose a Spatial-Frequency Token Selection (SFTS) module and a Hierarchical Masked Aggregation (HMA) module. These modules effectively facilitate the selection and aggregation of multi-modal tokenized features.
  • We propose two new loss functions with a Background Consistency Constraint (BCC) and an Object-Centric Feature Refinement (OCFR) to improve the feature discrimination with background suppressions.
  • Extensive experiments are performed on three multi-modal object ReID benchmarks. The results fully validate the effectiveness of our proposed methods.

Results

Multi-modal Object ReID

RGBNT201

Performance comparison with different modules [RGBNT201、RGBNT100]

Performance comparison with different modules Performance comparison with different modules

Parameter Analysis of EDITOR [RGBNT100]

Performance comparison with different modules

Visualizations

T-SNE

T-SNE

Similarity

Grad-CAM

Selection

Grad-CAM

Please check the paper for detailed information

Reproduction

Datasets

RGBNT201 link: https://drive.google.com/drive/folders/1EscBadX-wMAT56It5lXY-S3-b5nK1wH
Market1501-MM link: https://drive.google.com/drive/folders/1EscBadX-wMAT56
It5lXY-S3-b5nK1wH
RGBNT100 link: https://pan.baidu.com/s/1xqqh7N4Lctm3RcUdskG0Ug code:rjin
MSVR310 link: https://drive.google.com/file/d/1IxI-fGiluPOIes6YjDHeTEuVYhFdYwD/view?usp=drivelink

Pretrained

ViT-B link: https://pan.baidu.com/s/1YE-24vSo5pv_wHOF-y4sfA code: vmfm

Configs

RGBNT201 file: EDITOR/configs/RGBNT201/EDITOR.yml
Market1501-MM file: EDITOR/configs/Market1501-MM/EDITOR.yml
RGBNT100 file: EDITOR/configs/RGBNT100/EDITOR.yml
MSVR310 file: EDITOR/configs/MSVR310/EDITOR.yml

Tips

Please ensure that your batch size matches the one specified in the configuration file. Typically, using a batch size of 128 and a number of instances of 16 is more robust compared to using a batch size of 64/32 and a number of instances of 8. However, it's important to note that a batch size of 128 consumes over 24GB of GPU memory. In future work, we will pay attention to GPU memory consumption to provide convenient reproducible configurations for more researchers! Thank you for your attention!

Bash

```bash

python = 3.8

cuda = 11.4

!/bin/bash

source activate (your env) cd ../(your path) pip install -r requirements.txt python trainnet.py --configfile ../RGBNT201/EDITOR.yml ```

Star History

Star History Chart

Citation

If you find EDITOR useful in your research, please consider citing: ```bibtex @InProceedings{Zhang2024CVPR, author = {Zhang, Pingping and Wang, Yuhao and Liu, Yang and Tu, Zhengzheng and Lu, Huchuan}, title = {Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {17117-17126} }

Owner

  • Name: Yuhao Wang
  • Login: 924973292
  • Kind: user
  • Location: Dalian
  • Company: Dalian University of Technology

生如芥子,心藏须弥

GitHub Events

Total
  • Issues event: 4
  • Watch event: 32
  • Issue comment event: 3
  • Push event: 1
  • Fork event: 2
Last Year
  • Issues event: 4
  • Watch event: 32
  • Issue comment event: 3
  • Push event: 1
  • Fork event: 2

Dependencies

requirements.txt pypi
  • Babel ==2.11.0
  • Cython ==0.29.24
  • Django ==3.2.6
  • Flask ==2.0.2
  • Jinja2 ==3.1.2
  • Markdown ==3.3.4
  • PyWavelets ==1.4.1
  • PyYAML ==5.4.1
  • Send2Trash ==1.8.0
  • SoundFile ==0.10.3.post1
  • Sphinx ==4.2.0
  • Werkzeug ==2.0.2
  • absl-py ==0.14.1
  • alabaster ==0.7.12
  • antlr4-python3-runtime ==4.9.3
  • anyio ==3.6.2
  • apex ==0.1
  • appdirs ==1.4.4
  • argon2-cffi ==21.1.0
  • asgiref ==3.4.1
  • attrs ==21.2.0
  • audioread ==2.1.9
  • black ==21.4b2
  • bleach ==4.1.0
  • brotlipy ==0.7.0
  • cachetools ==4.2.4
  • certifi ==2021.5.30
  • cloudpickle ==2.2.1
  • codecov ==2.1.12
  • conda ==4.10.3
  • conda-build ==3.21.4
  • coverage ==6.0.1
  • cycler ==0.10.0
  • debugpy ==1.5.0
  • defusedxml ==0.7.1
  • docutils ==0.17.1
  • entrypoints ==0.3
  • expecttest ==0.1.3
  • fastjsonschema ==2.16.2
  • flake8 ==3.7.9
  • future ==0.18.2
  • fvcore ==0.1.5.post20221221
  • glob2 ==0.7
  • google-auth ==1.35.0
  • google-auth-oauthlib ==0.4.6
  • grpcio ==1.41.0
  • gunicorn ==20.1.0
  • h11 ==0.12.0
  • horovod ==0.27.0
  • httptools ==0.2.0
  • hydra-core ==1.3.2
  • hypothesis ==4.50.8
  • imagesize ==1.2.0
  • importlib-metadata ==6.0.0
  • importlib-resources ==5.10.2
  • iniconfig ==1.1.1
  • iopath ==0.1.9
  • ipykernel ==6.4.1
  • ipython-genutils ==0.2.0
  • itsdangerous ==2.0.1
  • joblib ==1.1.0
  • json5 ==0.9.6
  • jsonschema ==4.17.3
  • jupyter-server ==1.23.5
  • jupyter-tensorboard ==0.2.0
  • jupyter_client ==7.4.9
  • jupyter_core ==5.1.5
  • jupyterlab ==3.2.5
  • jupyterlab-pygments ==0.2.2
  • jupyterlab_server ==2.19.0
  • jupytext ==1.14.4
  • kiwisolver ==1.3.2
  • librosa ==0.8.1
  • llvmlite ==0.35.0
  • lmdb ==1.2.1
  • markdown-it-py ==1.1.0
  • matplotlib ==3.4.3
  • mccabe ==0.6.1
  • mdit-py-plugins ==0.2.8
  • mistune ==2.0.4
  • mypy-extensions ==1.0.0
  • nbclassic ==0.5.1
  • nbclient ==0.5.4
  • nbconvert ==7.2.9
  • nbformat ==5.7.3
  • nest-asyncio ==1.5.6
  • networkx ==2.0
  • nltk ==3.6.4
  • notebook ==6.4.1
  • notebook_shim ==0.2.2
  • nvidia-dali-cuda110 ==1.6.0
  • nvidia-pyindex ==1.0.9
  • oauthlib ==3.1.1
  • omegaconf ==2.3.0
  • packaging ==23.0
  • pandas ==2.0.3
  • pandocfilters ==1.5.0
  • pathspec ==0.11.1
  • pkgutil_resolve_name ==1.3.10
  • platformdirs ==2.6.2
  • pluggy ==1.0.0
  • polygraphy ==0.33.0
  • pooch ==1.5.1
  • portalocker ==2.3.2
  • prettytable ==2.2.1
  • prometheus-client ==0.11.0
  • protobuf ==3.18.1
  • py ==1.10.0
  • pyasn1 ==0.4.8
  • pyasn1-modules ==0.2.8
  • pybind11 ==2.8.0
  • pycodestyle ==2.5.0
  • pydot ==1.4.2
  • pyflakes ==2.1.1
  • pyparsing ==2.4.7
  • pyrsistent ==0.18.0
  • pytest ==6.2.5
  • pytest-cov ==3.0.0
  • pytest-pythonpath ==0.7.3
  • python-dateutil ==2.8.2
  • python-dotenv ==0.19.1
  • python-hostlist ==1.21
  • python-nvd3 ==0.15.0
  • python-slugify ==5.0.2
  • pytorch-quantization ==2.1.0
  • pyzmq ==25.0.0
  • regex ==2021.10.8
  • requests ==2.28.2
  • requests-oauthlib ==1.3.0
  • resampy ==0.2.2
  • rsa ==4.7.2
  • sacremoses ==0.0.46
  • scikit-learn ==1.0
  • sniffio ==1.3.0
  • snowballstemmer ==2.1.0
  • sphinx-glpi-theme ==0.3
  • sphinx-rtd-theme ==1.0.0
  • sphinxcontrib-applehelp ==1.0.2
  • sphinxcontrib-devhelp ==1.0.2
  • sphinxcontrib-htmlhelp ==2.0.0
  • sphinxcontrib-jsmath ==1.0.1
  • sphinxcontrib-qthelp ==1.0.3
  • sphinxcontrib-serializinghtml ==1.1.5
  • sqlparse ==0.4.2
  • tabulate ==0.8.9
  • tensorboard ==2.6.0
  • tensorboard-data-server ==0.6.1
  • tensorboard-plugin-wit ==1.8.0
  • termcolor ==2.3.0
  • terminado ==0.12.1
  • testpath ==0.5.0
  • text-unidecode ==1.3
  • threadpoolctl ==3.0.0
  • tinycss2 ==1.2.1
  • toml ==0.10.2
  • tomli ==1.2.1
  • torch ==1.10.0a0
  • tornado ==6.2
  • traitlets ==5.8.1
  • tzdata ==2024.1
  • uvicorn ==0.15.0
  • uvloop ==0.16.0
  • watchgod ==0.7
  • webencodings ==0.5.1
  • websocket-client ==1.5.0
  • websockets ==10.0
  • whitenoise ==5.3.0
  • yacs ==0.1.8
  • zipp ==3.12.0