opposite visualization and noisy activations issues for CLIP #537

JianbangZ · 2023-05-19T14:28:54Z

JianbangZ
May 19, 2023

Hi all,
I saw an interesting paper claiming two issues: opposite visualization and noisy activations for original CLIP model, which affects various downstream task performance and proposed an interesting fix without training
https://github.com/xmed-lab/CLIP_Surgery

Basically the CLIP is recognizing target object by looking at the background, instead of foreground, which indicates wrong relation in self-attention. I played with their demo, I think it's indeed so. And I also tested 2 open_clip models to further test it out. Here are the results.

I use an image and look ['window', 'wall','piano','cat']
CLIP VIT-B/16 Official checkpoint

OPEN_CLIP VIT-B/16 laion2b_s34b_b88k

OPEN_CLIP VIT-L/14 commonpool_xl_clip_s13b_b90k