https://github.com/cvi-szu/qa-clims

[ACM MM 2023] QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, scholar.google, acm.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.6%) to scientific vocabulary

Keywords

semantic-segmentation weakly-supervised-learning weakly-supervised-segmentation

Last synced: 5 months ago · JSON representation

Repository

[ACM MM 2023] QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation

Basic Info

Host: GitHub
Owner: CVI-SZU
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 9.21 MB

Statistics

Stars: 12
Watchers: 3
Forks: 0
Open Issues: 1
Releases: 0

Topics

semantic-segmentation weakly-supervised-learning weakly-supervised-segmentation

Created about 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License

[MM'23] QA-CLIMS

This is the official PyTorch implementation of our paper:

QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation
Songhe Deng, Wei Zhuo, Jinheng Xie, Linlin Shen
Computer Vision Institute, Shenzhen University
ACM International Conference on Multimedia, 2023
[Paper] [arXiv]

Environment

Python 3.7
PyTorch 1.7.1
torchvision 0.8.2

shell pip install -r requirements.txt

PASCAL VOC2012

You can find the following files at here.

| File | filename | |:---------------------------|:-------------------------------------------------------------------------------| | FG & BG VQA results | voc_vqa_fg_blip.npy
voc_vqa_bg_blip.npy | | FG & BG VQA text features | voc_vqa_fg_blip_ViT-L-14_cache.npy
voc_vqa_bg_blip_ViT-L-14_cache.npy | | pre-trained baseline model | res50_cam.pth | | QA-CLIMS model | res50_qa_clims.pth |

1. Prepare VQA result features

You can download the VQA text features voc_vqa_fg_blip_ViT-L-14_cache.npy and voc_vqa_bg_blip_ViT-L-14_cache.npy above and put its in vqa/.

Or, you can generate it by yourself:

To generate VQA results, please follow [third_party/README](third_party/README.md#BLIP). After that, run following command to generate VQA text features: ```shell python gen_text_feats_cache.py voc \ --vqa_fg_file vqa/voc_vqa_fg_blip.npy \ --vqa_fg_cache_file vqa/voc_vqa_fg_blip_ViT-L-14_cache.npy \ --vqa_bg_file vqa/voc_vqa_bg_blip.npy \ --vqa_bg_cache_file vqa/voc_vqa_bg_blip_ViT-L-14_cache.npy \ --clip ViT-L/14 ```

2. Train QA-CLIMS and generate initial CAMs

Please download the pre-trained baseline model res50_cam.pth above and put it at cam-baseline-voc12/res50_cam.pth.

shell bash run_voc12_qa_clims.sh

3. Train IRNet and generate pseudo semantic masks

shell bash run_voc12_sem_seg.sh

4.Train DeepLab using pseudo semantic masks.

Please follow deeplab-pytorch or CLIMS.

MS COCO2014

You can find the following files at here.

| File | filename | |:---------------------------|:---------------------------------------------------------------------------------| | FG & BG VQA results | coco_vqa_fg_blip.npy
coco_vqa_bg_blip.npy | | FG & BG VQA text features | coco_vqa_fg_blip_ViT-L-14_cache.npy
coco_vqa_bg_blip_ViT-L-14_cache.npy | | pre-trained baseline model | res50_cam.pth | | QA-CLIMS model | res50_qa_clims.pth |

Please place the downloaded coco_vqa_fg_blip_ViT-L-14_cache.npy and coco_vqa_bg_blip_ViT-L-14_cache.npy in vqa/, and res50_cam.pth in cam-baseline-coco14/.

Then, running the following command:

shell bash run_coco14_qa_clims.sh bash run_coco14_sem_seg.sh

Citation

If you find this code useful for your research, please consider cite our paper:

@inproceedings{deng2023qa-clims, title={QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation}, author={Deng, Songhe and Zhuo, Wei and Xie, Jinheng and Shen, Linlin}, booktitle={Proceedings of the 31st ACM International Conference on Multimedia}, pages={5572--5583}, year={2023} }

This repository was highly based on CLIMS and IRNet, thanks for their great works!

Owner

Name: Computer Vision Institute, SZU
Login: CVI-SZU
Kind: organization
Location: Shenzhen Univeristy, Shenzhen, China

Website: http://cv.szu.edu.cn/
Repositories: 13
Profile: https://github.com/CVI-SZU

Computer Vision Institute, Shenzhen University

GitHub Events

Total

Issues event: 7
Watch event: 1
Issue comment event: 7

Last Year

Issues event: 7
Watch event: 1
Issue comment event: 7

Committers

Last synced: almost 2 years ago

All Time

Total Commits: 2
Total Committers: 1
Avg Commits per committer: 2.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 2
Committers: 1
Avg Commits per committer: 2.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
SongHe	d**8@o**m	2

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 5
Total pull requests: 0
Average time to close issues: 19 days
Average time to close pull requests: N/A
Total issue authors: 2
Total pull request authors: 0
Average comments per issue: 5.2
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 5
Pull requests: 0
Average time to close issues: 19 days
Average time to close pull requests: N/A
Issue authors: 2
Pull request authors: 0
Average comments per issue: 5.2
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

HYTHYThythyt (4)
ineedugirl (3)
xixiaos (1)

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

requirements.txt pypi

chainercv *
cmapy *
cython *
imageio *
matplotlib *
nltk *
numpy *
opencv-python *
pydensecrf *
timm *
torch *
torchvision *
transformers *

third_party/BLIP/requirements.txt pypi

fairscale ==0.4.4
pycocoevalcap *
timm ==0.4.12
transformers ==4.15.0

third_party/CLIP/requirements.txt pypi

ftfy *
regex *
torch *
torchvision *
tqdm *

third_party/CLIP/setup.py pypi

for *
open *
str *