https://github.com/cvi-szu/qa-clims
[ACM MM 2023] QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, scholar.google, acm.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.6%) to scientific vocabulary
Keywords
Repository
[ACM MM 2023] QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation
Basic Info
Statistics
- Stars: 12
- Watchers: 3
- Forks: 0
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
[MM'23] QA-CLIMS
This is the official PyTorch implementation of our paper:
QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation
Songhe Deng, Wei Zhuo, Jinheng Xie, Linlin Shen
Computer Vision Institute, Shenzhen University
ACM International Conference on Multimedia, 2023
[Paper] [arXiv]

Environment
- Python 3.7
- PyTorch 1.7.1
- torchvision 0.8.2
shell
pip install -r requirements.txt
PASCAL VOC2012
You can find the following files at here.
| File | filename |
|:---------------------------|:-------------------------------------------------------------------------------|
| FG & BG VQA results | voc_vqa_fg_blip.npy
voc_vqa_bg_blip.npy |
| FG & BG VQA text features | voc_vqa_fg_blip_ViT-L-14_cache.npy
voc_vqa_bg_blip_ViT-L-14_cache.npy |
| pre-trained baseline model | res50_cam.pth |
| QA-CLIMS model | res50_qa_clims.pth |
1. Prepare VQA result features
You can download the VQA text features voc_vqa_fg_blip_ViT-L-14_cache.npy and voc_vqa_bg_blip_ViT-L-14_cache.npy above
and put its in vqa/.
Or, you can generate it by yourself:
To generate VQA results, please follow [third_party/README](third_party/README.md#BLIP). After that, run following command to generate VQA text features: ```shell python gen_text_feats_cache.py voc \ --vqa_fg_file vqa/voc_vqa_fg_blip.npy \ --vqa_fg_cache_file vqa/voc_vqa_fg_blip_ViT-L-14_cache.npy \ --vqa_bg_file vqa/voc_vqa_bg_blip.npy \ --vqa_bg_cache_file vqa/voc_vqa_bg_blip_ViT-L-14_cache.npy \ --clip ViT-L/14 ```2. Train QA-CLIMS and generate initial CAMs
Please download the pre-trained baseline model res50_cam.pth above and put it at cam-baseline-voc12/res50_cam.pth.
shell
bash run_voc12_qa_clims.sh
3. Train IRNet and generate pseudo semantic masks
shell
bash run_voc12_sem_seg.sh
4.Train DeepLab using pseudo semantic masks.
Please follow deeplab-pytorch or CLIMS.
MS COCO2014
You can find the following files at here.
| File | filename |
|:---------------------------|:---------------------------------------------------------------------------------|
| FG & BG VQA results | coco_vqa_fg_blip.npy
coco_vqa_bg_blip.npy |
| FG & BG VQA text features | coco_vqa_fg_blip_ViT-L-14_cache.npy
coco_vqa_bg_blip_ViT-L-14_cache.npy |
| pre-trained baseline model | res50_cam.pth |
| QA-CLIMS model | res50_qa_clims.pth |
Please place the downloaded coco_vqa_fg_blip_ViT-L-14_cache.npy and coco_vqa_bg_blip_ViT-L-14_cache.npy
in vqa/, and res50_cam.pth in cam-baseline-coco14/.
Then, running the following command:
shell
bash run_coco14_qa_clims.sh
bash run_coco14_sem_seg.sh
Citation
If you find this code useful for your research, please consider cite our paper:
@inproceedings{deng2023qa-clims,
title={QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation},
author={Deng, Songhe and Zhuo, Wei and Xie, Jinheng and Shen, Linlin},
booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
pages={5572--5583},
year={2023}
}
This repository was highly based on CLIMS and IRNet, thanks for their great works!
Owner
- Name: Computer Vision Institute, SZU
- Login: CVI-SZU
- Kind: organization
- Location: Shenzhen Univeristy, Shenzhen, China
- Website: http://cv.szu.edu.cn/
- Repositories: 13
- Profile: https://github.com/CVI-SZU
Computer Vision Institute, Shenzhen University
GitHub Events
Total
- Issues event: 7
- Watch event: 1
- Issue comment event: 7
Last Year
- Issues event: 7
- Watch event: 1
- Issue comment event: 7
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 5
- Total pull requests: 0
- Average time to close issues: 19 days
- Average time to close pull requests: N/A
- Total issue authors: 2
- Total pull request authors: 0
- Average comments per issue: 5.2
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 5
- Pull requests: 0
- Average time to close issues: 19 days
- Average time to close pull requests: N/A
- Issue authors: 2
- Pull request authors: 0
- Average comments per issue: 5.2
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- HYTHYThythyt (4)
- ineedugirl (3)
- xixiaos (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- chainercv *
- cmapy *
- cython *
- imageio *
- matplotlib *
- nltk *
- numpy *
- opencv-python *
- pydensecrf *
- timm *
- torch *
- torchvision *
- transformers *
- fairscale ==0.4.4
- pycocoevalcap *
- timm ==0.4.12
- transformers ==4.15.0
- ftfy *
- regex *
- torch *
- torchvision *
- tqdm *
- for *
- open *
- str *