gcd

[AAAI 2025] The official repository of our paper "GCD: Advancing Vision-Language Models for Incremental Object Detection via Global Alignment and Correspondence Distillation"

https://github.com/never-wx/gcd

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.5%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

[AAAI 2025] The official repository of our paper "GCD: Advancing Vision-Language Models for Incremental Object Detection via Global Alignment and Correspondence Distillation"

Basic Info

Host: GitHub
Owner: Never-wx
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 37.9 MB

Statistics

Stars: 7
Watchers: 1
Forks: 0
Open Issues: 3
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme Contributing License Code of conduct Citation

GCD: Advancing Vision-Language Models for Incremental Object Detection via Global Alignment and Correspondence Distillation

Official Pytorch implementation for "GCD: Advancing Vision-Language Models for Incremental Object Detection via Global Alignment and Correspondence Distillation", AAAI 2025.

[Paper] [Supplementary]

Abstract

Incremental object detection (IOD) is a challenging task that requires detection models to continuously learn from newly arriving data. This work focuses on incremental learning for vision-language detectors (VLDs), an under explored domain. Existing research typically adopts a local alignment paradigm to avoid label conflicts, where different tasks are learned separately without interaction. However, we reveal that this practice fails to effectively preserve the semantic structure. Specifically, aligned relationships between objects and texts would collapse when handling novel categories, ultimately leading to catastrophic forgetting. Though knowledge distillation (KD) is a common approach for tackling this, traditional KD performs poorly when directly applied to VLDs, as for different phases, a natural knowledge gap exists in both encoding and decoding processes. To address above issues, we propose a novel method called Global alignment and Correspondence Distillation (GCD). Differently, we first integrate knowledge across phases within the same embedding space to construct global semantic structure. We then enable effective knowledge distillation in VLDs through a semantic correspondence mechanism, ensuring consistent proposal generation and decoding. On the top of that, we distill teacher model’s informative predictions and topological relationships to maintain stable local semantic structure. Extensive experiments on COCO 2017 demonstrate that our method significantly outperforms existing approaches, achieving new state-of-the-art in various IOD scenarios.

Approach

example image

Get Started

This repo is based on MMDetection 3.3. Please follow the installation of MMDetection GETTING_STARTED.md and make sure you can run it successfully. bash conda create -n GCD python=3.8 -y source activate GCD pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113 pip install -U openmim mim install mmengine==0.8.5 mim install mmcv==2.0.0 cd our project pip install -v -e .

Dataset

Unzip COCO dataset into ./data/coco/
Run ./script/selectcategories2step.py and selectcategoriesnstep.py to split the COCO dataset python # Two-step(40+40): python ./script/select_categories_2step.py # to generate instances_train2017_0-39.json and instances_train2017_40-79.json, which is placed in ./data/coco/annotations/40+40 # Multi-step(40+10*4) trainset: python ./script/select_categories_nstep_train.py # divide instances_train2017_40-79.json into 4 steps [40-49, 50-59, 60-69, 70-79], which is placed in ./data/coco/annotations/40+10_4 # Multi-step(40+10*4) valset: python ./script/select_categories_nstep_val.py # divide instances_val2017.json, the valset is [0-49, 0-59, 0-69, 0-79(original file)]

Checkpoints

The base phase weights and dataset splits(40+40, 40+10_4, 70+10) can be obtained from GoogleDriver

Train

```python

assume that you are under the root directory of this project,

Two-step(70+10)

CUDAVISIBLEDEVICES=0,1,2,3 bash ./tools/disttrain.sh ./configs/gdinoinc/70+10/gdinoinc70+100-69scratchcoco.py 4 # train first 70 cats CUDAVISIBLEDEVICES=0,1,2,3 bash ./tools/disttrain.sh ./configs/gdinoinc/70+10/gdinoinc70+1070-79gcdscratch_coco.py 4 --amp # train last 10 cats incrementally

Multi-step(40+10*4)

CUDAVISIBLEDEVICES=0,1,2,3 bash ./tools/disttrain.sh ./configs/gdinoinc/40+40/gdinoinc40+400-39scratchcoco.py 4
CUDAVISIBLEDEVICES=0,1,2,3 bash ./tools/disttrain.sh ./configs/gdinoinc/40+104/gdinoinc40+10440-49gcdscratchcoco.py 4 --amp CUDAVISIBLEDEVICES=0,1,2,3 bash ./tools/disttrain.sh ./configs/gdinoinc/40+104/gdinoinc40+10450-59gcdscratchcoco.py 4 --amp CUDAVISIBLEDEVICES=0,1,2,3 bash ./tools/disttrain.sh ./configs/gdinoinc/40+104/gdinoinc40+10460-69gcdscratchcoco.py 4 --amp CUDAVISIBLEDEVICES=0,1,2,3 bash ./tools/disttrain.sh ./configs/gdinoinc/40+104/gdinoinc40+10470-79gcdscratch_coco.py 4 --amp ```

Test

python CUDA_VISIBLE_DEVICES=0,1,2,3 bash ./tools/dist_test.sh ./configs/gdino_inc/70+10/gdino_inc_70+10_70-79_gcd_scratch_coco.py ./work_dirs/gdino_inc_70+10_70-79_gcd_scratch_coco/epoch_12.pth 4 --cfg-options test_evaluator.classwise=True

Acknowledgement

Our code is based on the project MMDetection. Thanks to the work ERD and CL-DETR.

Citation

Please cite our paper if this repo helps your research:

```bibtex @inproceedings{wang2025gcd, title={GCD: Advancing Vision-Language Models for Incremental Object Detection via Global Alignment and Correspondence Distillation}, author={Wang, Xu and Wang, Zilei and Lin, Zihan}, booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, volume={39}, number={8}, pages={8015--8023}, year={2025} }

Owner

Name: Xu Wang
Login: Never-wx
Kind: user
Company: University of Science and Technology of China

Repositories: 1
Profile: https://github.com/Never-wx

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMDetection Contributors"
title: "OpenMMLab Detection Toolbox and Benchmark"
date-released: 2018-08-22
url: "https://github.com/open-mmlab/mmdetection"
license: Apache-2.0

GitHub Events

Total

Issues event: 7
Watch event: 11
Issue comment event: 20
Push event: 48

Last Year

Issues event: 7
Watch event: 11
Issue comment event: 20
Push event: 48

Dependencies

.github/workflows/deploy.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.circleci/docker/Dockerfile docker

pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build

docker/Dockerfile docker

pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build

docker/serve/Dockerfile docker

pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build

docker/serve_cn/Dockerfile docker

pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build

requirements/albu.txt pypi

albumentations >=0.3.2

requirements/build.txt pypi

cython *
numpy *

requirements/docs.txt pypi

docutils ==0.16.0
myst-parser *
sphinx ==4.0.2
sphinx-copybutton *
sphinx_markdown_tables *
sphinx_rtd_theme ==0.5.2
urllib3 <2.0.0

requirements/mminstall.txt pypi

mmcv >=2.0.0rc4,<2.2.0
mmengine >=0.7.1,<1.0.0

requirements/multimodal.txt pypi

fairscale *
jsonlines *
nltk *
pycocoevalcap *
transformers *

requirements/optional.txt pypi

cityscapesscripts *
emoji *
fairscale *
imagecorruptions *
scikit-learn *

requirements/readthedocs.txt pypi

mmcv >=2.0.0rc4,<2.2.0
mmengine >=0.7.1,<1.0.0
scipy *
torch *
torchvision *
urllib3 <2.0.0

requirements/runtime.txt pypi

matplotlib *
numpy *
pycocotools *
scipy *
shapely *
six *
terminaltables *
tqdm *

requirements/tests.txt pypi

asynctest * test
cityscapesscripts * test
codecov * test
flake8 * test
imagecorruptions * test
instaboostfast * test
interrogate * test
isort ==4.3.21 test
kwarray * test
memory_profiler * test
nltk * test
onnx ==1.7.0 test
onnxruntime >=1.8.0 test
parameterized * test
prettytable * test
protobuf <=3.20.1 test
psutil * test
pytest * test
transformers * test
ubelt * test
xdoctest >=0.10.0 test
yapf * test

requirements/tracking.txt pypi

mmpretrain *
motmetrics *
numpy <1.24.0
scikit-learn *
seaborn *

requirements.txt pypi

setup.py pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

gcd