grm

[CVPR'23] The official PyTorch implementation of our CVPR 2023 paper: "Generalized Relation Modeling for Transformer Tracking".

https://github.com/little-podi/grm

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.6%) to scientific vocabulary

Keywords

attention-mechanism cvpr2023 object-tracking pytorch single-object-tracking tracking vision-transformer visual-tracking
Last synced: 6 months ago · JSON representation ·

Repository

[CVPR'23] The official PyTorch implementation of our CVPR 2023 paper: "Generalized Relation Modeling for Transformer Tracking".

Basic Info
  • Host: GitHub
  • Owner: Little-Podi
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 660 KB
Statistics
  • Stars: 75
  • Watchers: 3
  • Forks: 8
  • Open Issues: 1
  • Releases: 1
Topics
attention-mechanism cvpr2023 object-tracking pytorch single-object-tracking tracking vision-transformer visual-tracking
Created almost 3 years ago · Last pushed about 2 years ago
Metadata Files
Readme License Citation

README.md

GRM

The official PyTorch implementation of our CVPR 2023 paper:

Generalized Relation Modeling for Transformer Tracking

Shenyuan Gao, Chunluan Zhou, Jun Zhang

[CVF Open Access] [ArXiv Preprint] [YouTube Video] [Trained Models] [Raw Results] [SOTA Paper List]

Highlight

:bookmark:Brief Introduction

Compared with previous two-stream trackers, the recent one-stream tracking pipeline, which allows earlier interaction between the template and search region, has achieved a remarkable performance gain. However, existing one-stream trackers always let the template interact with all parts inside the search region throughout all the encoder layers. This could potentially lead to target-background confusion when the extracted feature representations are not sufficiently discriminative. To alleviate this issue, we propose generalized relation modeling (GRM) based on adaptive token division. The proposed method is a generalized formulation of attention-based relation modeling for Transformer tracking, which inherits the merits of both previous two-stream and one-stream pipelines whilst enabling more flexible relation modeling by selecting appropriate search tokens to interact with template tokens.

:bookmark:Strong Performance

| Variant | GRM-GOT | GRM | GRM-L320 | | :-----------------------------: | :---------------------: | :---------------------: | :---------------------: | | Model Config | ViT-B, 256^2 resolution | ViT-B, 256^2 resolution | ViT-L, 320^2 resolution | | Training Setting | only GOT, 100 epochs | 4 datasets, 300 epochs | 4 datasets, 300 epochs | | GOT-10k (AO / SR 0.5 / SR 0.75) | 73.4 / 82.9 / 70.4 | - | - | | LaSOT (AUC / Norm P / P) | - | 69.9 / 79.3 / 75.8 | 71.4 / 81.2 / 77.9 | | TrackingNet (AUC / Norm P / P) | - | 84.0 / 88.7 / 83.3 | 84.4 / 88.9 / 84.0 | | AVisT (AUC / OP50 / OP75) | - | 54.5 / 63.1 / 45.2 | 55.1 / 63.8 / 46.9 | | NfS30 (AUC) | - | 65.6 | 66.0 | | UAV123 (AUC) | - | 70.2 | 72.2 |

:bookmark:Inference Speed

Our baseline model (backbone: ViT-B, resolution: 256x256) can run at 45 fps (frames per second) on a single NVIDIA GeForce RTX 3090.

:bookmark:Training Cost

It takes less than half a day to train our baseline model for 300 epochs on 8 NVIDIA GeForce RTX 3090 (each of which has 24GB GPU memory).

Release

Trained Models (including the baseline model GRM, GRM-GOT and a stronger variant GRM-L320) [download zip file]

Raw Results (including raw tracking results on six datasets we benchmarked in the paper and listed above) [download zip file]

Download and unzip these two zip files into the output directory under GRM project path, then both of them can be directly used by our code.

Let's Get Started

  • ### Environment

Our experiments are conducted with Ubuntu 20.04 and CUDA 11.6.

  • Preparation

    • Clone our repository to your local project directory.
    • Download the pre-trained weights from MAE or DeiT, and place the files into the pretrained_models directory under GRM project path. You may want to try different pre-trained weights, so I list the links of pre-trained models integrated in this project.

    | Backbone Type | Model File | Checkpoint Link | | :-----------: | :--------------------------------------------: | :----------------------------------------------------------: | | 'vitbase' | 'maepretrainvitbase.pth' | download | | 'vitlarge' | 'maepretrainvitlarge.pth' | download | | 'vitbase' | 'deitbasepatch16224-b5f2ef4d.pth' | download | | 'vitbase' | 'deitbasedistilledpatch16_224-df68dfff.pth' | download | - Download the training datasets (LaSOT, TrackingNet, GOT-10k, COCO2017) and testing datasets (NfS, UAV123, AVisT) to your disk, the organized directory should look like:

    --LaSOT/ |--airplane |... |--zebra --TrackingNet/ |--TRAIN_0 |... |--TEST --GOT10k/ |--test |--train |--val --COCO/ |--annotations |--images --NFS30/ |--anno |--sequences --UAV123/ |--anno |--data_seq --AVisT/ |--anno |--full_occlusion |--out_of_view |--sequences - Edit the paths in lib/test/evaluation/local.py and lib/train/adim/local.py to the proper ones.

  • Installation

We use conda to manage the environment.

conda create --name grm python=3.9 conda activate grm bash install.sh

  • Training

    • Multiple GPU training by DDP (suppose you have 8 GPU)

    python tracking/train.py --mode multiple --nproc 8 - Single GPU debugging (too slow, not recommended for training)

    python tracking/train.py - For GOT-10k evaluation, remember to set --config vitb_256_got_ep100. - To pursuit performance, switch to a stronger variant by setting --config vitl_320_ep300.

  • Evaluation

    • Make sure you have prepared the trained model.
    • LaSOT

    python tracking/test.py --dataset lasot

    Then evaluate the raw results using the official MATLAB toolkit. - TrackingNet

    python tracking/test.py --dataset trackingnet python lib/test/utils/transform_trackingnet.py

    Then upload test/tracking_results/grm/vitb_256_ep300/trackingnet_submit.zip to the online evaluation server. - GOT-10k

    python tracking/test.py --param vitb_256_got_ep100 --dataset got10k_test python lib/test/utils/transform_got10k.py

    Then upload test/tracking_results/grm/vitb_256_got_ep100/got10k_submit.zip to the online evaluation server. - NfS30, UAV123, AVisT

    python tracking/test.py --dataset nfs python tracking/test.py --dataset uav python tracking/test.py --dataset avist python tracking/analysis_results.py - For multiple threads inference, just add --threads 40 after tracking/test.py (suppose you want to use 40 threads in total). - To show the immediate prediction results during inference, modify settings.show_result = True in lib/test/evaluation/local.py (may have bugs if you try this on a remote sever). - Please refer to DynamicViT Example for the visualization of search token division results.

Acknowledgement

:heart::heart::heart:Our idea is implemented base on the following projects. We really appreciate their excellent open-source works!

Citation

If any parts of our paper and code help your research, please consider citing us and giving a star to our repository.

@inproceedings{gao2023generalized, title={Generalized Relation Modeling for Transformer Tracking}, author={Gao, Shenyuan and Zhou, Chunluan and Zhang, Jun}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={18686--18695}, year={2023} }

Contact

If you have any questions or concerns, feel free to open issues or directly contact me through the ways on my GitHub homepage. Suggestions and collaborations are also highly welcome!

Owner

  • Name: Shenyuan Gao
  • Login: Little-Podi
  • Kind: user
  • Location: Hong Kong SAR
  • Company: HKUST

Ph.D. student at HKUST since 22fall.

Citation (CITATION.cff)

cff-version: 1.2.0
message: 'Please kindly cite us if you find it useful.'
authors:
- family-names: 'Gao'
  given-names: 'Shenyuan'
title: 'GRM'
version: 1.0
date-released: 2023-03-30
url: 'https://github.com/Little-Podi/GRM'
preferred-citation:
  title: 'Generalized Relation Modeling for Transformer Tracking'
  type: conference-paper
  authors:
  - family-names: 'Gao'
    given-names: 'Shenyuan'
  - family-names: 'Zhou'
    given-names: 'Chunluan'
  - family-names: 'Zhang'
    given-names: 'Jun'
  collection-title: 'Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition' # booktitle
  start: 18686 # First page number
  end: 18695 # Last page number
  year: 2023

GitHub Events

Total
  • Issues event: 1
  • Watch event: 11
  • Issue comment event: 1
  • Fork event: 1
Last Year
  • Issues event: 1
  • Watch event: 11
  • Issue comment event: 1
  • Fork event: 1