gcd

[AAAI 2025] The official repository of our paper "GCD: Advancing Vision-Language Models for Incremental Object Detection via Global Alignment and Correspondence Distillation"

https://github.com/never-wx/gcd

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

[AAAI 2025] The official repository of our paper "GCD: Advancing Vision-Language Models for Incremental Object Detection via Global Alignment and Correspondence Distillation"

Basic Info
  • Host: GitHub
  • Owner: Never-wx
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 37.9 MB
Statistics
  • Stars: 7
  • Watchers: 1
  • Forks: 0
  • Open Issues: 3
  • Releases: 0
Created about 1 year ago · Last pushed 10 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

GCD: Advancing Vision-Language Models for Incremental Object Detection via Global Alignment and Correspondence Distillation

Official Pytorch implementation for "GCD: Advancing Vision-Language Models for Incremental Object Detection via Global Alignment and Correspondence Distillation", AAAI 2025.

[Paper] [Supplementary]

Abstract

Incremental object detection (IOD) is a challenging task that requires detection models to continuously learn from newly arriving data. This work focuses on incremental learning for vision-language detectors (VLDs), an under explored domain. Existing research typically adopts a local alignment paradigm to avoid label conflicts, where different tasks are learned separately without interaction. However, we reveal that this practice fails to effectively preserve the semantic structure. Specifically, aligned relationships between objects and texts would collapse when handling novel categories, ultimately leading to catastrophic forgetting. Though knowledge distillation (KD) is a common approach for tackling this, traditional KD performs poorly when directly applied to VLDs, as for different phases, a natural knowledge gap exists in both encoding and decoding processes. To address above issues, we propose a novel method called Global alignment and Correspondence Distillation (GCD). Differently, we first integrate knowledge across phases within the same embedding space to construct global semantic structure. We then enable effective knowledge distillation in VLDs through a semantic correspondence mechanism, ensuring consistent proposal generation and decoding. On the top of that, we distill teacher model’s informative predictions and topological relationships to maintain stable local semantic structure. Extensive experiments on COCO 2017 demonstrate that our method significantly outperforms existing approaches, achieving new state-of-the-art in various IOD scenarios.

Approach

example image

Get Started

  • This repo is based on MMDetection 3.3. Please follow the installation of MMDetection GETTING_STARTED.md and make sure you can run it successfully. bash conda create -n GCD python=3.8 -y source activate GCD pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113 pip install -U openmim mim install mmengine==0.8.5 mim install mmcv==2.0.0 cd our project pip install -v -e .

Dataset

  • Unzip COCO dataset into ./data/coco/
  • Run ./script/selectcategories2step.py and selectcategoriesnstep.py to split the COCO dataset python # Two-step(40+40): python ./script/select_categories_2step.py # to generate instances_train2017_0-39.json and instances_train2017_40-79.json, which is placed in ./data/coco/annotations/40+40 # Multi-step(40+10*4) trainset: python ./script/select_categories_nstep_train.py # divide instances_train2017_40-79.json into 4 steps [40-49, 50-59, 60-69, 70-79], which is placed in ./data/coco/annotations/40+10_4 # Multi-step(40+10*4) valset: python ./script/select_categories_nstep_val.py # divide instances_val2017.json, the valset is [0-49, 0-59, 0-69, 0-79(original file)]

Checkpoints

The base phase weights and dataset splits(40+40, 40+10_4, 70+10) can be obtained from GoogleDriver

Train

```python

assume that you are under the root directory of this project,

Two-step(70+10)

CUDAVISIBLEDEVICES=0,1,2,3 bash ./tools/disttrain.sh ./configs/gdinoinc/70+10/gdinoinc70+100-69scratchcoco.py 4 # train first 70 cats CUDAVISIBLEDEVICES=0,1,2,3 bash ./tools/disttrain.sh ./configs/gdinoinc/70+10/gdinoinc70+1070-79gcdscratch_coco.py 4 --amp # train last 10 cats incrementally

Multi-step(40+10*4)

CUDAVISIBLEDEVICES=0,1,2,3 bash ./tools/disttrain.sh ./configs/gdinoinc/40+40/gdinoinc40+400-39scratchcoco.py 4
CUDA
VISIBLEDEVICES=0,1,2,3 bash ./tools/disttrain.sh ./configs/gdinoinc/40+104/gdinoinc40+10440-49gcdscratchcoco.py 4 --amp CUDAVISIBLEDEVICES=0,1,2,3 bash ./tools/disttrain.sh ./configs/gdinoinc/40+104/gdinoinc40+10450-59gcdscratchcoco.py 4 --amp CUDAVISIBLEDEVICES=0,1,2,3 bash ./tools/disttrain.sh ./configs/gdinoinc/40+104/gdinoinc40+10460-69gcdscratchcoco.py 4 --amp CUDAVISIBLEDEVICES=0,1,2,3 bash ./tools/disttrain.sh ./configs/gdinoinc/40+104/gdinoinc40+10470-79gcdscratch_coco.py 4 --amp ```

Test

python CUDA_VISIBLE_DEVICES=0,1,2,3 bash ./tools/dist_test.sh ./configs/gdino_inc/70+10/gdino_inc_70+10_70-79_gcd_scratch_coco.py ./work_dirs/gdino_inc_70+10_70-79_gcd_scratch_coco/epoch_12.pth 4 --cfg-options test_evaluator.classwise=True

Acknowledgement

Our code is based on the project MMDetection. Thanks to the work ERD and CL-DETR.

Citation

Please cite our paper if this repo helps your research:

```bibtex @inproceedings{wang2025gcd, title={GCD: Advancing Vision-Language Models for Incremental Object Detection via Global Alignment and Correspondence Distillation}, author={Wang, Xu and Wang, Zilei and Lin, Zihan}, booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, volume={39}, number={8}, pages={8015--8023}, year={2025} }

Owner

  • Name: Xu Wang
  • Login: Never-wx
  • Kind: user
  • Company: University of Science and Technology of China

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMDetection Contributors"
title: "OpenMMLab Detection Toolbox and Benchmark"
date-released: 2018-08-22
url: "https://github.com/open-mmlab/mmdetection"
license: Apache-2.0

GitHub Events

Total
  • Issues event: 7
  • Watch event: 11
  • Issue comment event: 20
  • Push event: 48
Last Year
  • Issues event: 7
  • Watch event: 11
  • Issue comment event: 20
  • Push event: 48

Dependencies

.github/workflows/deploy.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.circleci/docker/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
docker/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
docker/serve/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
docker/serve_cn/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
requirements/albu.txt pypi
  • albumentations >=0.3.2
requirements/build.txt pypi
  • cython *
  • numpy *
requirements/docs.txt pypi
  • docutils ==0.16.0
  • myst-parser *
  • sphinx ==4.0.2
  • sphinx-copybutton *
  • sphinx_markdown_tables *
  • sphinx_rtd_theme ==0.5.2
  • urllib3 <2.0.0
requirements/mminstall.txt pypi
  • mmcv >=2.0.0rc4,<2.2.0
  • mmengine >=0.7.1,<1.0.0
requirements/multimodal.txt pypi
  • fairscale *
  • jsonlines *
  • nltk *
  • pycocoevalcap *
  • transformers *
requirements/optional.txt pypi
  • cityscapesscripts *
  • emoji *
  • fairscale *
  • imagecorruptions *
  • scikit-learn *
requirements/readthedocs.txt pypi
  • mmcv >=2.0.0rc4,<2.2.0
  • mmengine >=0.7.1,<1.0.0
  • scipy *
  • torch *
  • torchvision *
  • urllib3 <2.0.0
requirements/runtime.txt pypi
  • matplotlib *
  • numpy *
  • pycocotools *
  • scipy *
  • shapely *
  • six *
  • terminaltables *
  • tqdm *
requirements/tests.txt pypi
  • asynctest * test
  • cityscapesscripts * test
  • codecov * test
  • flake8 * test
  • imagecorruptions * test
  • instaboostfast * test
  • interrogate * test
  • isort ==4.3.21 test
  • kwarray * test
  • memory_profiler * test
  • nltk * test
  • onnx ==1.7.0 test
  • onnxruntime >=1.8.0 test
  • parameterized * test
  • prettytable * test
  • protobuf <=3.20.1 test
  • psutil * test
  • pytest * test
  • transformers * test
  • ubelt * test
  • xdoctest >=0.10.0 test
  • yapf * test
requirements/tracking.txt pypi
  • mmpretrain *
  • motmetrics *
  • numpy <1.24.0
  • scikit-learn *
  • seaborn *
requirements.txt pypi
setup.py pypi