Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: GrokCV
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Size: 13 MB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 11 months ago · Last pushed 9 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

🪄 AuxDet: Auxiliary Metadata Matters for Omni-Domain Infrared Small Target Detection

arXiv


Authors: Yangting Shi, Renjie He, Le Hui, Xiang Li, Jian Yang, Ming-Ming Cheng, Yimian Dai

The official implementation of "AuxDet: Auxiliary Metadata Matters for Omni-Domain Infrared Small Target Detection".


🧾 Introduction

Infrared imaging plays a vital role in real-world applications such as marine resource exploration, precision guidance, and ecological monitoring, thanks to its ability to detect targets under harsh weather and low-visibility conditions.

However, most existing work in infrared small target detection (IRSTD) focuses narrowly on domain-specific feature engineering, while ignoring the wide-area, cross-domain detection needs of real-world infrared systems.

This has led to a serious issue: current IRSTD methods often break down when evaluated under Omni-Domain settings, where imaging conditions—such as platforms, spectral bands, and resolutions—vary significantly.

🎯 Our core research question is: In Omni-Domain IRSTD, can a model recognize the imaging domain of each input and adapt accordingly? Our answer is yes.

📡 In practical infrared sensing systems, each captured image comes with a set of auxiliary metadata—such as sensor platform, spectral band, and spatial resolution.

💡 While often overlooked, this metadata encodes key factors driving the domain-specific appearance and feature distribution. It can serve as a compact, semantically rich descriptor of the imaging domain.


🔍 Architecture

We propose the Auxiliary Metadata Driven Detector (AuxDet), a novel framework for Omni-IRSTD that redefines the conventional visual-only paradigm by leveraging auxiliary metadata for multi-modal scene-aware optimization.

AuxDet


📊 Dataset

We adopt the WideIRSTD-Full dataset as our evaluation benchmark. The dataset consists of 7 public datasets and an additional dataset developed by the team at the National University of Defense Technology (NUDT). It is important to note that auxiliary metadata corresponding to images is only provided in the training set. Therefore, for the training set only, we used the re-splitting provided in https://github.com/YeRen123455/ICPR-Track2-LightWeight/tree/main/dataset/ICPRTrack2/7020 for our experiments.


🚀 Installation

Step 1. Clone the repository

shell git clone https://github.com/GrokCV/AuxDet.git cd AuxDet

Step 2. Create environment and install dependencies

```shell conda create --name AUXD python=3.9 -y conda activate AUXD conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia pip install -U openmim mim install mmengine
mim install "mmcv>=2.0.0rc4,<2.2.0"

cd AuxDet/

pip install -v -e . ```


🧊 Getting Started

First, place your dataset files according to the following folder structure.

python data_root |--VOC2007 |-- ImageSets |-- Main # Following the dataset split from https://github.com/YeRen123455/ICPR-Track2-LightWeight/tree/main/dataset/ICPR_Track2/70_20 |-- train.txt |-- val.txt |-- Annotations |-- 00001.xml |-- 00002.xml |-- ... |-- PNGImages # Images under WideIRSTD-Full Dataset/train/images (a total of 9,000 images) |-- 00001.png |-- 00002.png |-- ... We convert the image bounding boxes and related auxiliary information into the XML format. You can download the annotations here.

Remember to update the dataset paths in the code accordingly.

```python

Open the file: ./configs/base/datasets/voc_aux.py

Update Line 2 with your dataset root directory

data_root = "your own root directory"

Modify Line 60 and related fields accordingly:

annfile='VOC2007/ImageSets/Main/train.txt',
img
subdir="PNGImages", annsubdir="Annotations", dataprefix=dict(subdataroot='VOC2007/' ```

Training

shell CUDA_VISIBLE_DEVICES=0 python tools/train.py configs/auxdet/auxdet_r50_fpn_1x_voc.py --work-dir your_path

Testing

shell CUDA_VISIBLE_DEVICES=0 python tools/test.py configs/auxdet/auxdet_r50_fpn_1x_voc.py "best.pth" --work-dir your_path

We provide our pretrained weight file here. Please download it and specify the path to "best.pth" in the command above.


⚖️ Benchmark

| Category | Venue | Backbone | FLOPs ↓ | Params. ↓ | AP50 (%) ↑ | Recall (%) ↑ | | ------------------------------ | -------------------- | -------------------- | ------- | --------- | --------------------- | ------------ | | General Object Detection | | | | | | | | 🔹 One-stage | | | | | | | | RetinaNet | ICCV17 | ResNet50 | 0.197T | 36.330M | 60.7 | 74.8 | | TOOD | ICCV19 | ResNet50 | 0.191T | 32.018M | 53.2 | 61.9 | | FCOS | ICCV19 | ResNet50 | 0.194T | 32.113M | 49.2 | 58.3 | | GFL | NeurIPS20 | ResNet50 | 0.197T | 32.258M | 48.5 | 60.2 | | ATSS | CVPR20 | ResNet50 | 0.194T | 32.113M | 50.1 | 60.6 | | AutoAssign | arXiv20 | ResNet50 | 0.195T | 36.244M | 51.2 | 62.1 | | 🔹 Two-stage | | | | | | | | Faster R-CNN | NeurIPS15 | ResNet50 | 0.200T | 41.348M | 31.2 | 33.5 | | Cascade R-CNN | TPAMI19 | ResNet50 | 0.228T | 69.152M | 35.8 | 37.5 | | Grid R-CNN | CVPR19 | ResNet50 | 0.313T | 64.467M | 32.2 | 34.8 | | Cascade RPN | NeurIPS19 | ResNet50 | 0.190T | 42.527M | 74.4 | 84.6 | | Libra R-CNN | CVPR19 | ResNet50 | 0.209T | 41.611M | 30.7 | 33.8 | | Dynamic R-CNN | ECCV20 | ResNet50 | 0.200T | 41.348M | 35.0 | 37.2 | | SABL | ECCV20 | ResNet50 | 0.198T | 36.357M | 64.1 | 77.8 | | 🔹 End-to-End | | | | | | | | Deformable DETR | ICLR21 | ResNet50 | 0.189T | 40.099M | 33.1 | 49.8 | | Sparse R-CNN | CVPR21 | ResNet50 | 0.146T | 0.106G | 40.4 | 62.1 | | DINO | arXiv22 | ResNet50 | 0.268T | 47.540M | 67.1 | 82.8 | | DAB-DETR | ICLR22 | ResNet50 | 99.511G | 43.702M | 17.6 | 31.9 | | Small Target Specialized | | | | | | | | ALCNet | TGRS21 | ResNet20 | 6.047G | 42.700M | 36.3 | 69.7 | | NWD | ISPRS22 | DetectoRS (ResNet50) | 0.235T | 0.123G | 48.5 | 53.6 | | RFLA | ECCV22 | DetectoRS (ResNet50) | 0.235T | 0.123G | 71.8 | 80.6 | | OSCAR | TGRS23 | ResNet50 | 0.327T | 30.730M | 40.2 | 65.4 | | DNANet | TIP23 | - | 0.228T | 4.697M | 73.9 | 46.9 | | RDIAN | TGRS23 | - | 0.059T | 0.217M | 62.5 | 51.8 | | DQ-DETR | ECCV24 | ResNet50 | 1.476T | 58.680M | 70.1 | 81.3 | | EFLNet | TGRS24 | - | 0.102T | 38.336M | 73.7 | 68.1 | | PConv | AAAI25 | - | 0.012T | 2.922M | 71.7 | 64.5 | | AuxDet (Ours) | - | ResNet50 | 0.215T | 45.279M | 77.9 | 87.2 |

🔖 Citation


@article{shi2025AuxDet, title={AuxDet: Auxiliary Metadata Matters for Omni-Domain Infrared Small Target Detection}, author={Yangting Shi and Renjie He and Le Hui and Xiang Li and Jian Yang and Ming-Ming Cheng and Yimian Dai}, year={2025}, journal={arXiv preprint arXiv:2505.15184}, }

Owner

  • Name: GrokCV
  • Login: GrokCV
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMDetection Contributors"
title: "OpenMMLab Detection Toolbox and Benchmark"
date-released: 2018-08-22
url: "https://github.com/open-mmlab/mmdetection"
license: Apache-2.0

GitHub Events

Total
  • Issues event: 3
  • Watch event: 6
  • Issue comment event: 1
  • Push event: 5
  • Public event: 1
Last Year
  • Issues event: 3
  • Watch event: 6
  • Issue comment event: 1
  • Push event: 5
  • Public event: 1

Dependencies

.github/workflows/deploy.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.circleci/docker/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
requirements/albu.txt pypi
  • albumentations >=0.3.2
requirements/build.txt pypi
  • cython *
  • numpy *
requirements/docs.txt pypi
  • docutils ==0.16.0
  • myst-parser *
  • sphinx ==4.0.2
  • sphinx-copybutton *
  • sphinx_markdown_tables *
  • sphinx_rtd_theme ==0.5.2
  • urllib3 <2.0.0
requirements/mminstall.txt pypi
  • mmcv >=2.0.0rc4,<2.2.0
  • mmengine >=0.7.1,<1.0.0
requirements/multimodal.txt pypi
  • fairscale *
  • jsonlines *
  • nltk *
  • pycocoevalcap *
  • transformers *
requirements/optional.txt pypi
  • cityscapesscripts *
  • emoji *
  • fairscale *
  • imagecorruptions *
  • scikit-learn *
requirements/readthedocs.txt pypi
  • mmcv >=2.0.0rc4,<2.2.0
  • mmengine >=0.7.1,<1.0.0
  • scipy *
  • torch *
  • torchvision *
  • urllib3 <2.0.0
requirements/runtime.txt pypi
  • matplotlib *
  • numpy *
  • pycocotools *
  • scipy *
  • shapely *
  • six *
  • terminaltables *
  • tqdm *
requirements/tests.txt pypi
  • asynctest * test
  • cityscapesscripts * test
  • codecov * test
  • flake8 * test
  • imagecorruptions * test
  • instaboostfast * test
  • interrogate * test
  • isort ==4.3.21 test
  • kwarray * test
  • memory_profiler * test
  • nltk * test
  • onnx ==1.7.0 test
  • onnxruntime >=1.8.0 test
  • parameterized * test
  • prettytable * test
  • protobuf <=3.20.1 test
  • psutil * test
  • pytest * test
  • transformers * test
  • ubelt * test
  • xdoctest >=0.10.0 test
  • yapf * test
requirements/tracking.txt pypi
  • mmpretrain *
  • motmetrics *
  • numpy <1.24.0
  • scikit-learn *
  • seaborn *
requirements.txt pypi
setup.py pypi