promptdet

PromptDet: Towards Open-vocabulary Detection using Uncurated Images, ECCV2022

https://github.com/fcjian/promptdet

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.4%) to scientific vocabulary

Keywords

clip computer-vision eccv2022 novel-categories object-detection prompt-learning pseudo-labeling regional-prompt self-training vocabulary web-image zero-shot-learning
Last synced: 6 months ago · JSON representation

Repository

PromptDet: Towards Open-vocabulary Detection using Uncurated Images, ECCV2022

Basic Info
  • Host: GitHub
  • Owner: fcjian
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 43.5 MB
Statistics
  • Stars: 166
  • Watchers: 2
  • Forks: 7
  • Open Issues: 11
  • Releases: 0
Topics
clip computer-vision eccv2022 novel-categories object-detection prompt-learning pseudo-labeling regional-prompt self-training vocabulary web-image zero-shot-learning
Created almost 4 years ago · Last pushed over 3 years ago
Metadata Files
Readme License Citation

README.md

PromptDet: Towards Open-vocabulary Detection using Uncurated Images (ECCV 2022)

Paper     Website

Introduction

The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations. To achieve that, we make the following four contributions: (i) in pursuit of generalisation, we propose a two-stage open-vocabulary object detector, where the class-agnostic object proposals are classified with a text encoder from pre-trained visual-language model; (ii) To pair the visual latent space (of RPN box proposals) with that of the pre-trained text encoder, we propose the idea of regional prompt learning to align the textual embedding space with regional visual object features; (iii) To scale up the learning procedure towards detecting a wider spectrum of objects, we exploit the available online resource via a novel self-training framework, which allows to train the proposed detector on a large corpus of noisy uncurated web images. Lastly, (iv) to evaluate our proposed detector, termed as PromptDet, we conduct extensive experiments on the challenging LVIS and MS-COCO dataset. PromptDet shows superior performance over existing approaches with fewer additional training images and zero manual annotations whatsoever.

Training framework

method overview

updates - July 20, 2022: add the code for LAION-novel and self-training - March 28, 2022: initial release

Prerequisites

  • MMDetection version 2.16.0.

  • Please see get_started.md for installation and the basic usage of MMDetection.

Regional Prompt Learning (RPL)

We learn the prompt vectors in an off-line manner using RPL. For your convenience, we also provide the learned prompt vectors and the category embeddings.

LAION-novel dataset

The LAION-novel dataset based on the learned category embeddings can be generated by using the PromptDet tools as follows: ```python

stege-I: install the dependencies, download the laion400m 64GB image.index and metadata.hdf5 (https://the-eye.eu/public/AI/cah/), and then retrival the LAION images (urls)

pip install faiss-cpu==1.7.2 img2dataset==1.12.0 fire==0.4.0 h5py==3.6.0 python tools/promptdet/retrievallaionimage.py --indice-folder [laion400m-64GB-index] --metadata [metadata.hdf5] --text-features promptdetresources/lviscategoryembeddings.pt --output-folder data/laionlvis/images --num-images 500

stege-II: download the LAION images

python tools/promptdet/downloadlaionimage.py --output-folder data/laion_lvis/images --num-thread 10

stege-III: convert the LAION images to mmdetection format

python tools/promptdet/laiondatasetconverter.py --data-path data/laionlvis/images --out-file data/laionlvis/laion_train.json --topK 300 ``` For your convenience, we also provide the image urls of our LAION-novel dataset.

Inference

```python

assume that you are under the root directory of this project,

and you have activated your virtual environment if needed,

and with LVIS v1.0 dataset in 'data/lvis_v1'.

./tools/disttest.sh configs/promptdet/promptdetr50fpnsample1e-3mstrain1xlvisv1selftrain.py workdirs/promptdetr50fpnsample1e-3mstrain1xlvisv1selftrain.pth 4 --eval bbox segm ```

Train

```python

download 'lvisv1trainseen.json' to 'data/lvisv1/annotations'.

train detector without self-training

./tools/disttrain.sh configs/promptdet/promptdetr50fpnsample1e-3mstrain1xlvisv1.py 4

train detector with self-training

./tools/disttrain.sh configs/promptdet/promptdetr50fpnsample1e-3mstrain1xlvisv1selftrain.py 4 ``` [0] Annotation file of base categories: lvisv1train_seen.json. \ [1] Note that we provide a EpochPromptDetRunner to fetch the data from mutilple datasets alternately.

Models

For your convenience, we provide the following trained models (PromptDet) with mask AP.

Model | RPL | Self-training | Epochs | Scale Jitter | Input Size | APnovel | APc | APf | AP | Download --- |:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---: Baseline (manual prompt) | | | 12 | 640~800 | 800x800 | 7.4 | 17.2 | 26.1 | 19.0 | google PromptDetR50FPN1x | ✓ | | 12 | 640~800 | 800x800 | 11.5 | 19.4 | 26.7 | 20.9 | google PromptDetR50FPN1x | ✓ | ✓ | 12 | 640~800 | 800x800 | 19.5 | 18.2 | 25.6 | 21.3 | google PromptDetR50FPN6x | ✓ | ✓ | 72 | 100~1280 | 800x800 | 21.7 | 23.2 | 29.6 | 25.5 | google

[0] All results are obtained with a single model and without any test time data augmentation such as multi-scale, flipping and etc.. \ [1] Refer to more details in config files in config/promptdet/.

Acknowledgement

Thanks MMDetection team for the wonderful open source project!

Citation

If you find PromptDet useful in your research, please consider citing:

@inproceedings{feng2022promptdet, title={PromptDet: Towards Open-vocabulary Detection using Uncurated Images}, author={Feng, Chengjian and Zhong, Yujie and Jie, Zequn and Chu, Xiangxiang and Ren, Haibing and Wei, Xiaolin and Xie, Weidi and Ma, Lin}, journal={Proceedings of the European Conference on Computer Vision}, year={2022} }

Owner

  • Name: Chengjian Feng
  • Login: fcjian
  • Kind: user

GitHub Events

Total
  • Watch event: 6
Last Year
  • Watch event: 6

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 41
  • Total Committers: 8
  • Avg Commits per committer: 5.125
  • Development Distribution Score (DDS): 0.585
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
fengchengjian f****n@m****m 17
hadoop-vacv h****v@s****t 10
fengchengjian f****n@o****m 3
hadoop-vacv h****v@s****t 3
hadoop-vacv h****v@s****t 3
lijinlong11 l****1@m****m 3
hadoop-vacv h****v@s****t 1
fcjian 8****n 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 16
  • Total pull requests: 0
  • Average time to close issues: about 2 months
  • Average time to close pull requests: N/A
  • Total issue authors: 14
  • Total pull request authors: 0
  • Average comments per issue: 1.06
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • yyyyyyfs (2)
  • hanoonaR (2)
  • Kyfafyd (1)
  • lsx66 (1)
  • jihwanp (1)
  • Feobi1999 (1)
  • XinZhangRadar (1)
  • YasminZhang (1)
  • krisandchris (1)
  • vansin (1)
  • liujiaheng (1)
  • eternaldolphin (1)
  • LeonG7 (1)
  • RuoyuChen10 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements/build.txt pypi
  • cython *
  • numpy *
requirements/docs.txt pypi
  • docutils ==0.16.0
  • recommonmark *
  • sphinx ==4.0.2
  • sphinx_markdown_tables *
  • sphinx_rtd_theme ==0.5.2
requirements/mminstall.txt pypi
  • mmcv-full >=1.3.8
requirements/optional.txt pypi
  • cityscapesscripts *
  • imagecorruptions *
  • scipy *
  • sklearn *
requirements/readthedocs.txt pypi
  • mmcv *
  • torch *
  • torchvision *
requirements/runtime.txt pypi
  • matplotlib *
  • numpy *
  • pycocotools *
  • pycocotools-windows *
  • six *
  • terminaltables *
requirements/tests.txt pypi
  • asynctest * test
  • codecov * test
  • flake8 * test
  • interrogate * test
  • isort ==4.3.21 test
  • kwarray * test
  • mmtrack * test
  • onnx ==1.7.0 test
  • onnxruntime >=1.8.0 test
  • pytest * test
  • ubelt * test
  • xdoctest >=0.10.0 test
  • yapf * test
docker/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
docker/serve/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build