This is the official PyTorch implementation of our paper:
QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation
Songhe Deng, Wei Zhuo, Jinheng Xie, Linlin Shen
Computer Vision Institute, Shenzhen University
ACM International Conference on Multimedia, 2023
[Paper] [arXiv]
- Python 3.7
- PyTorch 1.7.1
- torchvision 0.8.2
pip install -r requirements.txt
You can find the following files at here.
File | filename |
---|---|
FG & BG VQA results | voc_vqa_fg_blip.npy voc_vqa_bg_blip.npy |
FG & BG VQA text features | voc_vqa_fg_blip_ViT-L-14_cache.npy voc_vqa_bg_blip_ViT-L-14_cache.npy |
pre-trained baseline model | res50_cam.pth |
QA-CLIMS model | res50_qa_clims.pth |
You can download the VQA text features voc_vqa_fg_blip_ViT-L-14_cache.npy
and voc_vqa_bg_blip_ViT-L-14_cache.npy
above
and put its in vqa/
.
Or, you can generate it by yourself:
To generate VQA results, please follow third_party/README.
After that, run following command to generate VQA text features:
python gen_text_feats_cache.py voc \
--vqa_fg_file vqa/voc_vqa_fg_blip.npy \
--vqa_fg_cache_file vqa/voc_vqa_fg_blip_ViT-L-14_cache.npy \
--vqa_bg_file vqa/voc_vqa_bg_blip.npy \
--vqa_bg_cache_file vqa/voc_vqa_bg_blip_ViT-L-14_cache.npy \
--clip ViT-L/14
Please download the pre-trained baseline model res50_cam.pth
above and put it at cam-baseline-voc12/res50_cam.pth
.
bash run_voc12_qa_clims.sh
bash run_voc12_sem_seg.sh
Please follow deeplab-pytorch or CLIMS.
You can find the following files at here.
File | filename |
---|---|
FG & BG VQA results | coco_vqa_fg_blip.npy coco_vqa_bg_blip.npy |
FG & BG VQA text features | coco_vqa_fg_blip_ViT-L-14_cache.npy coco_vqa_bg_blip_ViT-L-14_cache.npy |
pre-trained baseline model | res50_cam.pth |
QA-CLIMS model | res50_qa_clims.pth |
Please place the downloaded coco_vqa_fg_blip_ViT-L-14_cache.npy
and coco_vqa_bg_blip_ViT-L-14_cache.npy
in vqa/
, and res50_cam.pth
in cam-baseline-coco14/
.
Then, running the following command:
bash run_coco14_qa_clims.sh
bash run_coco14_sem_seg.sh
If you find this code useful for your research, please consider cite our paper:
@inproceedings{deng2023qa-clims,
title={QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation},
author={Deng, Songhe and Zhuo, Wei and Xie, Jinheng and Shen, Linlin},
booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
pages={5572--5583},
year={2023}
}
This repository was highly based on CLIMS and IRNet, thanks for their great works!