cpa-enhancer
This is the official repository of the paper: CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary
Repository
This is the official repository of the paper: CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations
Basic Info
Statistics
- Stars: 47
- Watchers: 2
- Forks: 2
- Open Issues: 11
- Releases: 0
Metadata Files
README.md
CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations
📰 ArXiv Preprint: Arxiv 2403.11220
✅ Updates
March. 24th, 2024: We have released the CPA-Seg for segmentation tasks of CPA-Enhancer.
🚀 Overview
Overview of the proposed CPA-Enhancer.
Our proposed content-driven prompt block (CPB).
Abstract : Object detection methods under known single degradations have been extensively investigated. However, existing approaches require prior knowledge of the degradation type and train a separate model for each, limiting their practical applications in unpredictable environments. To address this challenge, we propose a chain-of-thought (CoT) prompted adaptive enhancer, CPA-Enhancer, for object detection under unknown degradations. Specifically, CPA-Enhancer progressively adapts its enhancement strategy under the step-by-step guidance of CoT prompts, that encode degradation-related information. To the best of our knowledge, it’s the first work that exploits CoT prompting for object detection tasks. Overall, CPA-Enhancer is a plug-and-play enhancement model that can be integrated into any generic detectors to achieve substantial gains on degraded images, without knowing the degradation type priorly. Experimental results demonstrate that CPA-Enhancer not only sets the new state of the art for object detection but also boosts the performance of other downstream vision tasks under multiple unknown degradations.
🛠️ Installation
- Step0. Download and install Miniconda from the official website.
- Step1. Create a conda environment and activate it.
shell
conda create --name openmmlab python=3.8 -y
conda activate openmmlab
- Step2.Install PyTorch following official instructions, e.g.
shell
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
shell
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"
- Step4. Install other related packages.
shell
cd CPA_Enhancer
pip install -r ./cpa/requirements.txt
📁 Data Preparation
Synthetic Datasets
- Step1. Download VOC PASCAL trainval and test data
shell
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
- Step2. Construct
VnA-T( containing 5 categories, with a total of 8111 images) /VnB-T(containing 10 categories, with a total of 12334 images) fromVOCtrainval_06-Nov-2007.tarandVOCtrainval_11-May-2012.tar; ConstructVnA-T( containing 5 categories, with a total of 2734 images) /VnB-T(containing 10 categories, with a total of 3760 images) fromVOCtest_06-Nov-2007.tar.
We also provide a list of image names included in each dataset, which you can find in the cpa/dataSyn/datalist.
```python
5 class
target_classes = ['person','car','bus','bicycle','motorbike']
10 class
target_classes = ['bicycle','boat','bottle','bus','car','cat','chair','dog','motorbike','person'] ```
Make sure the directory follows this basic VOC structure.
shell
data_vocnorm (data_vocnorm_10) # path\to\vocnorm
├── train # VnA-T (VnB-T)
| ├── Annotations
| | └──xxx.xml
| | ...
| └── ImageSets
| | └──Main
| | └──train_voc.txt # you can find it in cpa\dataSyn\datalist
| └── JPEGImages
| └──xxx.jpg
| ...
├── test # VnA (VnB)
| ├── Annotations
| | └──xxx.xml
| | ...
| └── ImageSets
| | └──Main
| | └──test_voc.txt # you can find it in cpa\dataSyn\datalist
| └── JPEGImages
| └──xxx.jpg
| ...
- Step3. Sythesize degraded datasets from VnA and VnB by executing the following command and restructure them into VOC format.
```shell
Modify the paths in the code to match your actual paths.
all-in-one setting
python cpa/dataSyn/datamakefog.py # VF/VF-T python cpa/dataSyn/datamakelowlight.py # VD/VD-T/VDB python cpa/dataSyn/datamakesnow.py # VS/VS-T python cpa/dataSyn/datamakerain.py # VR/VR-T
one-by-one setting
python cpa/dataSyn/datamakefoghybrid.py # VF-HT python cpa/dataSyn/datamakelowlighthybrid.py # VD-HT ```
Real-world Datasets
- Step1. Download Exdark and RTTS datasets.
- Step2. Restructure the RTTS dataset (4322 images) into VOC format, ensuring that the directory conforms to this basic structure.
shell
RTTS # path\to\RTTS
├── Annotations
| └──xxx.xml
| ...
└── ImageSets
| └──Main
| └──test_rtts.txt
└── JPEGImages
└──xxx.jpg
...
- Step3. Similarly, restructure the ExdarkA dataset (containing 5 categories, with a total of 1283 images) and the ExdarkB dataset (containing 10 categories, with a total of 2563 images) into VOC format.
shell
exdark_5 (exdark_10) # path\to\ExDarkA (ExDarkB)
├── Annotations
| └──xxx.xml
| ...
└── ImageSets
| └──Main
| └──test_exdark_5.txt (test_exdark_10.txt) # you can find it in cpa\dataSyn\datalist
└── JPEGImages
└──xxx.jpg
...
🎯 Usage
📍 All-in-One Setting
- Step 1. Modify the
METAINFOinmmdet/datasets/voc.py
python
METAINFO = {
'classes': ('person', 'car', 'bus', 'bicycle', 'motorbike'), # 5 classes
'palette': [(106, 0, 228), (119, 11, 32), (165, 42, 42), (0, 0, 192),(197, 226, 255)]
}
- Step 2. Modify the
voc_classesinmmdet/evaluation/functional/class_names.py
python
def voc_classes() -> list:
return [
'person', 'car', 'bus', 'bicycle', 'motorbike' # 5 classes
]
- Step 3. Modify the
num_classesinconfigs\yolo\cpa_config.py
python
bbox_head=dict(
type='YOLOV3Head',
num_classes=5, # 5 classes
...
)
- Step 4. Recompile the code.
cd CPA_Enhancer
pip install -v -e .
- Step 5. Modify the
data_root,ann_fileanddata_prefixinconfigs\yolo\cpa_config.pyto match your actual paths of the used datasets.
The pretrained models and training/testing logs can be found in
checkpoint.zip
🔹 Train
```shell
Train our model from scratch.
python tools/train.py configs/yolo/cpa_config.py
```
🔹 Test
```shell
you can download our pretrained model for testing
python tools/test.py configs/yolo/cpa_config.py path/to/checkpoint/xx.pth ```
🔹 Demo
```shell
you can download our pretrained model for inference
python demo/cpademo.py \ --inputs ../cpa/testimage # path to your input images or dictionary --model ../configs/yolo/cpaconfig.py --weights path/to/checkpoint/xx.pth --out-dir ../cpa/output # output file ```
📍 One-by-One Setting
For the foggy conditions (containing five categories), the overall process is the same as above (Step1-5).
For the low-light conditions ( containing ten categories ) , You only need to modify a few places as follows (Step1-3).
- Step 1. Modify the
METAINFOinmmdet/datasets/voc.py
````python
10 classes
METAINFO = { 'classes': ('bicycle', 'boat', 'bottle','bus', 'car', 'cat', 'chair','dog','motorbike','person'), 'palette': [(106, 0, 228), (119, 11, 32), (165, 42, 42), (0, 0, 192),(197, 226, 255), (0, 60, 100), (0, 0, 142), (255, 77, 255), (153, 69, 1), (120, 166, 157),] } ````
- Step 2. Modify the
voc_classesinmmdet/evaluation/functional/class_names.py
python
def voc_classes() -> list:
return [
'bicycle', 'boat', 'bottle','bus', 'car', 'cat', 'chair','dog','motorbike','person' # 10 classes
]
- Step 3. Modify the
num_classesinconfigs/yolo/cpa_config.py
python
bbox_head=dict(
type='YOLOV3Head',
num_classes=10, # 10 classes
...
)
📊 Results
Quantitative results
Quantitative comparisons under the all-in-one setting.
Comparisons in the one-by-one setting under the foggy degradation (left) and low-light degradation (right)
Visual Results
Visual comparisons of CPA-Enhancer under the all-in-one setting.
💐 Acknowledgments
Special thanks to the creators of mmdetection upon which this code is built, for their valuable work in advancing object detection research.
🔗 Citation
If you use this codebase, or CPA-Enhancer inspires your work, we would greatly appreciate it if you could star the repository and cite it using the following BibTeX entry.
@misc{zhang2024cpaenhancer,
title={CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations},
author={Yuwei Zhang and Yan Wu and Yanming Liu and Xinyue Peng},
year={2024},
eprint={2403.11220},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Owner
- Name: zyw-stu
- Login: zyw-stu
- Kind: user
- Location: BeiJing
- Repositories: 1
- Profile: https://github.com/zyw-stu
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - name: "MMDetection Contributors" title: "OpenMMLab Detection Toolbox and Benchmark" date-released: 2018-08-22 url: "https://github.com/open-mmlab/mmdetection" license: Apache-2.0
GitHub Events
Total
- Issues event: 10
- Watch event: 7
- Issue comment event: 8
- Fork event: 1
Last Year
- Issues event: 10
- Watch event: 7
- Issue comment event: 8
- Fork event: 1
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 2
- Total pull requests: 0
- Average time to close issues: 5 months
- Average time to close pull requests: N/A
- Total issue authors: 2
- Total pull request authors: 0
- Average comments per issue: 0.5
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 0
- Average time to close issues: 5 months
- Average time to close pull requests: N/A
- Issue authors: 2
- Pull request authors: 0
- Average comments per issue: 0.5
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Faker-Lost (2)
- wuyuyuyuaaa (2)
- Lyzx123 (2)
- roemin1999 (1)
- 13185742215 (1)
- lovemuyao (1)
- sunday1112 (1)
- mrwrui (1)
- wangdalu4399 (1)
- ducnt1210 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
- Pillow ==10.0.1
- PyYAML ==6.0.1
- Requests ==2.31.0
- Shapely ==2.0.3
- addict ==2.4.0
- albumentations ==1.4.1
- boto3 ==1.34.62
- botocore ==1.34.62
- cityscapesscripts ==2.2.2
- clip ==0.2.0
- einops ==0.7.0
- fairscale ==0.4.13
- ffmpegcv ==0.3.11
- gradio ==4.21.0
- imagecorruptions ==1.1.2
- imageio ==2.31.4
- instaboostfast ==0.1.2
- label_studio_ml ==1.0.9
- label_studio_tools ==0.0.3
- lap ==0.4.0
- matplotlib ==3.7.4
- memory_profiler ==0.61.0
- mmcv ==2.1.0
- mmengine ==0.10.1
- mmpretrain ==1.2.0
- model_archiver ==1.0.3
- model_index ==0.1.11
- motmetrics ==1.4.0
- nltk ==3.8.1
- numpy ==1.24.3
- opencv_python ==4.8.1.78
- openpyxl ==3.1.2
- pandas ==2.0.3
- parameterized ==0.9.0
- prettytable ==3.10.0
- psutil ==5.9.0
- pycocoevalcap ==1.2
- pycocotools ==2.0.7
- pytest ==8.1.1
- pytorch_sphinx_theme ==0.0.19
- rich ==13.7.1
- roboflow ==1.1.23
- sahi ==0.11.15
- scikit_image ==0.19.3
- scikit_learn ==1.3.2
- scipy ==1.10.1
- seaborn ==0.13.2
- setuptools ==60.2.0
- six ==1.16.0
- terminaltables ==3.1.10
- thop ==0.1.1.post2209072238
- torch ==1.11.0
- torchvision ==0.12.0
- tqdm ==4.65.2
- transformers ==4.38.2
- ts ==0.5.1
- wandb ==0.16.1
- xlrd ==2.0.1
- xlutils ==2.0.0
- albumentations >=0.3.2
- cython *
- numpy *
- docutils ==0.16.0
- myst-parser *
- sphinx ==4.0.2
- sphinx-copybutton *
- sphinx_markdown_tables *
- sphinx_rtd_theme ==0.5.2
- urllib3 <2.0.0
- mmcv >=2.0.0rc4,<2.2.0
- mmengine >=0.7.1,<1.0.0
- fairscale *
- nltk *
- pycocoevalcap *
- transformers *
- cityscapesscripts *
- fairscale *
- imagecorruptions *
- scikit-learn *
- mmcv >=2.0.0rc4,<2.2.0
- mmengine >=0.7.1,<1.0.0
- scipy *
- torch *
- torchvision *
- urllib3 <2.0.0
- matplotlib *
- numpy *
- pycocotools *
- scipy *
- shapely *
- six *
- terminaltables *
- tqdm *
- asynctest * test
- cityscapesscripts * test
- codecov * test
- flake8 * test
- imagecorruptions * test
- instaboostfast * test
- interrogate * test
- isort ==4.3.21 test
- kwarray * test
- memory_profiler * test
- nltk * test
- onnx ==1.7.0 test
- onnxruntime >=1.8.0 test
- parameterized * test
- prettytable * test
- protobuf <=3.20.1 test
- psutil * test
- pytest * test
- transformers * test
- ubelt * test
- xdoctest >=0.10.0 test
- yapf * test
- mmpretrain *
- motmetrics *
- numpy <1.24.0
- scikit-learn *
- seaborn *