[MM'23] QA-CLIMS

This is the official PyTorch implementation of our paper:

QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation
Songhe Deng, Wei Zhuo, Jinheng Xie, Linlin Shen
Computer Vision Institute, Shenzhen University
ACM International Conference on Multimedia, 2023
[Paper] [arXiv]

Environment

Python 3.7
PyTorch 1.7.1
torchvision 0.8.2

pip install -r requirements.txt

PASCAL VOC2012

You can find the following files at here.

File	filename
FG & BG VQA results	`voc_vqa_fg_blip.npy` `voc_vqa_bg_blip.npy`
FG & BG VQA text features	`voc_vqa_fg_blip_ViT-L-14_cache.npy` `voc_vqa_bg_blip_ViT-L-14_cache.npy`
pre-trained baseline model	`res50_cam.pth`
QA-CLIMS model	`res50_qa_clims.pth`

1. Prepare VQA result features

You can download the VQA text features voc_vqa_fg_blip_ViT-L-14_cache.npy and voc_vqa_bg_blip_ViT-L-14_cache.npy above and put its in vqa/.

Or, you can generate it by yourself:

To generate VQA results, please follow third_party/README.

After that, run following command to generate VQA text features:

python gen_text_feats_cache.py voc \
    --vqa_fg_file vqa/voc_vqa_fg_blip.npy \
    --vqa_fg_cache_file vqa/voc_vqa_fg_blip_ViT-L-14_cache.npy \
    --vqa_bg_file vqa/voc_vqa_bg_blip.npy \
    --vqa_bg_cache_file vqa/voc_vqa_bg_blip_ViT-L-14_cache.npy \
    --clip ViT-L/14

2. Train QA-CLIMS and generate initial CAMs

Please download the pre-trained baseline model res50_cam.pth above and put it at cam-baseline-voc12/res50_cam.pth.

bash run_voc12_qa_clims.sh

3. Train IRNet and generate pseudo semantic masks

bash run_voc12_sem_seg.sh

4.Train DeepLab using pseudo semantic masks.

Please follow deeplab-pytorch or CLIMS.

MS COCO2014

You can find the following files at here.

File	filename
FG & BG VQA results	`coco_vqa_fg_blip.npy` `coco_vqa_bg_blip.npy`
FG & BG VQA text features	`coco_vqa_fg_blip_ViT-L-14_cache.npy` `coco_vqa_bg_blip_ViT-L-14_cache.npy`
pre-trained baseline model	`res50_cam.pth`
QA-CLIMS model	`res50_qa_clims.pth`

Please place the downloaded coco_vqa_fg_blip_ViT-L-14_cache.npy and coco_vqa_bg_blip_ViT-L-14_cache.npy in vqa/, and res50_cam.pth in cam-baseline-coco14/.

Then, running the following command:

bash run_coco14_qa_clims.sh
bash run_coco14_sem_seg.sh

Citation

If you find this code useful for your research, please consider cite our paper:

@inproceedings{deng2023qa-clims,
  title={QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation},
  author={Deng, Songhe and Zhuo, Wei and Xie, Jinheng and Shen, Linlin},
  booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
  pages={5572--5583},
  year={2023}
}

This repository was highly based on CLIMS and IRNet, thanks for their great works!

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
misc		misc
mscoco		mscoco
net		net
source		source
step		step
step_coco		step_coco
third_party		third_party
voc12		voc12
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
clip_loss.py		clip_loss.py
gen_text_feats_cache.py		gen_text_feats_cache.py
ovseg_utils.py		ovseg_utils.py
parser.py		parser.py
requirements.txt		requirements.txt
run_coco14_qa_clims.sh		run_coco14_qa_clims.sh
run_coco14_sem_seg.sh		run_coco14_sem_seg.sh
run_sample.py		run_sample.py
run_sample_coco.py		run_sample_coco.py
run_voc12_qa_clims.sh		run_voc12_qa_clims.sh
run_voc12_sem_seg.sh		run_voc12_sem_seg.sh
visual_questions.py		visual_questions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[MM'23] QA-CLIMS

Environment

PASCAL VOC2012

1. Prepare VQA result features

2. Train QA-CLIMS and generate initial CAMs

3. Train IRNet and generate pseudo semantic masks

4.Train DeepLab using pseudo semantic masks.

MS COCO2014

Citation

About

Languages

License

CVI-SZU/QA-CLIMS

Folders and files

Latest commit

History

Repository files navigation

[MM'23] QA-CLIMS

Environment

PASCAL VOC2012

1. Prepare VQA result features

2. Train QA-CLIMS and generate initial CAMs

3. Train IRNet and generate pseudo semantic masks

4.Train DeepLab using pseudo semantic masks.

MS COCO2014

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages