Skip to content

zhaozh10/Awesome-CLIP-in-Medical-Imaging

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Awesome CLIP in Medical Imaging

Awesome License: MIT

🔥🔥 This is a collection of awesome articles about CLIP in medical imaging🔥🔥

Citation

@article{zhao2023clip,
  title={CLIP in Medical Imaging: A Comprehensive Survey},
  author={Zihao Zhao and Yuxiao Liu and Han Wu and Yonghao Li and Sheng Wang and Lin Teng and Disheng Liu and  Zhiming Cui and Qian Wang and Dinggang Shen},
  journal={arXiv preprint arXiv:2312.07353},
  year={2023},
}

Overview


Taxonomy of studies focusing on CLIP in the field of medical imaging.

Updates

  • ArXiv preprint release: December 13, 2023
  • Github repo release: December 12, 2023

Dataset Resource

dataset domain image text source language pre-trained CLIP
ROCO multiple 87K 87K research papers En PubMedCLIP
MedICaT multiple 217K 217K research papers En /
PMC-OA multiple 1.6M 1.6M research papers En PMC-CLIP
ChiMed-VL multiple 580K 580K research papers En/zh /
FFA-IR fundus 1M 10K medical reports En/zh /
PadChest cxr 160K 109K medical reports Sp /
MIMIC-CXR cxr 377K 227K medical reports En BioViL/BioViL-T
CT-RATE chest ct 50K 50K medicla reports En CT-CLIP
OpenPath histology 208K 208K social media En PLIP
Quilt-1M histology 1M 1M research papers
social media
En QuiltNet

Pre-training

Multi-scale

[MICCAI 2020] Joint Modeling of Chest Radiographs and Radiology Reports for Pulmonary Edema Assessment
Geeticka Chauhan, Ruizhi Liao, William Wells, Jacob Andreas, Xin Wang, Seth Berkowitz, Steven Horng, Peter Szolovits, Polina Golland
[paper] [code]

[ICCV 2021] GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition
Shih-Cheng Huang, Liyue Shen, Matthew P. Lungren, Serena Yeung
[paper] [code]

[MICCAI 2021] Multimodal Representation Learning via Maximization of Local Mutual Information
Ruizhi Liao, Daniel Moyer, Miriam Cha, Keegan Quigley, Seth Berkowitz, Steven Horng, Polina Golland, and William M. Wells
[paper]

[ECCV 2022] Joint Learning of Localized Representations from Medical Images and Reports
Philip Müller, Georgios Kaissis, Congyu Zou, Daniel Rückert
[paper] [code]

[ECCV 2022] Making the Most of Text Semantics to Improve Biomedical Vision–Language Processing
Benedikt Boecking, Naoto Usuyama, Shruthi Bannur, Daniel C. Castro, Anton Schwaighofer, Stephanie Hyland, Maria Wetscherek, Tristan Naumann, Aditya Nori, Javier Alvarez-Valle, Hoifung Poon, and Ozan Oktay
[paper] [code]

[NeurIPS 2022 Workshop] The Role of Local Alignment and Uniformity in Image-Text Contrastive Learning on Medical Images
Philip Müller, Georgios Kaissis, Daniel Rueckert
[paper]

[MICCAI 2022] Breaking with Fixed Set Pathology Recognition through Report-Guided Contrastive Training
Constantin Seibold, Simon Reiß, M. Saquib Sarfraz, Rainer Stiefelhagen, Jens Kleesiek
[paper]

[MICCAI 2022] Vision-Language Contrastive Learning Approach to Robust Automatic Placenta Analysis Using Photographic Images
Yimu Pan, Alison D. Gernand, Jeffery A. Goldstein, Leena Mithal, Delia Mwinyelle, James Z. Wang
[paper]

[ICLR 2023] Advancing Radiograph Representation Learning with Masked Record Modeling
Hong-Yu Zhou, Chenyu Lian, Liansheng Wang, Yizhou Yu
[paper] [code]

[ICCV 2023] LIMITR: Leveraging Local Information for Medical Image-Text Representation
Gefen Dawidowicz, Elad Hirsch, Ayellet Tal
[paper] [code]

[ICCV 2023] PRIOR: Prototype Representation Joint Learning from Medical Images and Reports
Pujin Cheng, Li Lin, Junyan Lyu, Yijin Huang, Wenhan Luo, Xiaoying Tang
[paper] [code]

[MICCAI 2023] Contrastive Masked Image-Text Modeling for Medical Visual Representation Learning
Cheng Chen, Aoxiao Zhong, Dufan Wu, Jie Luo, Quanzheng Li
[paper] [code]

[MICCAI 2023] Enhancing Automatic Placenta Analysis through Distributional Feature Recomposition in Vision-Language Contrastive Learning
Yimu Pan, Tongan Cai, Manas Mehta, Alison D. Gernand, Jeffery A. Goldstein, Leena Mithal, Delia Mwinyelle, Kelly Gallagher, James Z. Wang
[paper]

[MICCAI 2023] MedIM: Boost Medical Image Representation via Radiology Report-Guided Masking
Yutong Xie, Lin Gu, Tatsuya Harada, Jianpeng Zhang, Yong Xia, Qi Wu
[paper] [code]

[MLHC 2023] TIER: Text-Image Entropy Regularization for Medical CLIP-style models
Anil Palepu, Andrew Beam
[paper] [code]

[EMNLP 2023] Fine-grained Medical Vision-Language Representation Learning for Radiology Report Generation
Siyuan Wang, Bo Peng, Yichao Liu, Qi Peng
[paper]

[MedIA 2023] Self-supervised multi-modal training from uncurated images and reports enables monitoring AI in radiology
Sangjoon Park, Eun Sun Lee, Kyung Sook Shin, Jeong Eun Lee, Jong Chul Ye
[paper]

[TMM 2023] Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training
Ke Zhang, Yan Yang, Jun Yu, Hanliang Jiang, Jianping Fan, Qingming Huang, Weidong Han
[paper]

[ESA 2023] MITER: Medical Image–TExt joint adaptive pretRaining with multi-level contrastive learning
Chang Shu, Yi Zhu, Xiaochu Tang, Jing Xiao, Youxin Chen, Xiu Li, Qian Zhang, Zheng Lu
[paper] [code]

[arXiv 2023] Local Contrastive Learning for Medical Image Recognition
Syed A. Rizvi, Ruixiang Tang, Xiaoqian Jiang, Xiaotian Ma, Xia Hu
[paper]

[arXiv 2023] G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training
Che Liu, Cheng Ouyang, Sibo Cheng, Anand Shah, Wenjia Bai, Rossella Arcucci
[paper]

[arXiv 2023] Fine-Grained Image-Text Alignment in Medical Imaging Enables Cyclic Image-Report Generation
Wenting Chen, Xiang Li, Linlin Shen, Yixuan Yuan
[paper]

[IEEE TMI 2024] Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning
Aohan Liu,Yuchen Guo,Jun-Hai Yong,Feng Xu
[paper]

[ICML 2024] Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training
Jinxia Yang, Bing Su, Wayne Xin Zhao, Ji-Rong Wen
[paper]

[CVPR 2024] CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification
Haoran Lai, Qingsong Yao, Zihang Jiang, Rongsheng Wang, Zhiyang He, Xiaodong Tao, S. Kevin Zhou
[paper]

[arXiv 2024] MeDSLIP: Medical Dual-Stream Language-Image Pre-training for Fine-grained Alignment
Wenrui Fan, Mohammod Naimul Islam Suvon, Shuo Zhou, Xianyuan Liu, Samer Alabed, Venet Osmani, Andrew Swift, Chen Chen, Haiping Lu
[paper]

[arXiv 2024] Anatomical Structure-Guided Medical Vision-Language Pre-training
Qingqiu Li, Xiaohan Yan, Jilan Xu, Runtian Yuan, Yuejie Zhang, Rui Feng, Quanli Shen, Xiaobo Zhang, Shujun Wang
[paper]

[arXiv 2024] CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios
Jingyang Lin, Yingda Xia, Jianpeng Zhang, Ke Yan, Le Lu, Jiebo Luo, Ling Zhang
[paper]

[arXiv 2024] Enhancing medical vision-language contrastive learning via inter-matching relation modelling
Mingjian Li, Mingyuan Meng, Michael Fulham, David Dagan Feng, Lei Bi, Jinman Kim
[paper]

[arXiv 2024] Multi-modal vision-language model for generalizable annotation-free pathology localization and clinical diagnosis
Hao Yang, Hong-Yu Zhou, Zhihuan Li, Yuanxu Gao, Cheng Li, Weijian Huang, Jiarun Liu, Hairong Zheng, Kang Zhang, Shanshan Wang
[paper]

[arXiv 2024] Enhancing Representation in Medical Vision-Language Foundation Models via Multi-Scale Information Extraction Techniques
Weijian Huang, Cheng Li, Hong-Yu Zhou, Jiarun Liu, Hao Yang, Yong Liang, Guangming Shi, Hairong Zheng, Shanshan Wang
[paper]

[arXiv 2024] MLIP: Medical Language-Image Pre-training with Masked Local Representation Learning
Jiarun Liu, Hong-Yu Zhou, Cheng Li, Weijian Huang, Hao Yang, Yong Liang, Shanshan Wang
[paper]

[arXiv 2024] Multimodal self-supervised learning for lesion localization
Hao Yang, Hong-Yu Zhou, Cheng Li, Weijian Huang, Jiarun Liu, Yong Liang, Shanshan Wang
[paper]


Data-efficient

[EMNLP 2022] MedCLIP: Contrastive Learning from Unpaired Medical Images and Text
Zifeng Wang, Zhenbang Wu, Dinesh Agarwal, Jimeng Sun
[paper] [code]

[NeurIPS 2022] Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning
Fuying Wang, Yuyin Zhou, Shujun Wang, Varut Vardhanabhuti, Lequan Yu
[paper] [code]

[ISBRA 2023] TCSA: A Text-Guided Cross-View Medical Semantic Alignment Framework for Adaptive Multi-view Visual Representation Learning
Hongyang Lei, Huazhen Huang, Bokai Yang, Guosheng Cui, Ruxin Wang, Dan Wu , and Ye Li
[paper]

[CVPR 2023] Learning to Exploit Temporal Structure for Biomedical Vision–Language Processing
Shruthi Bannur,∗ Stephanie Hyland∗, Qianchu Liu, Fernando P ́ erez-Garc ́ıa, Maximilian Ilse, Daniel C. Castro, Benedikt Boecking, Harshita Sharma, Kenza Bouzid, Anja Thieme, Anton Schwaighofer, Maria Wetscherek, Matthew P. Lungren, Aditya Nori Javier Alvarez-Valle, Ozan Oktay
[paper] [code]

[MICCAI 2023] CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training
Kihyun You, Jawook Gu, Jiyeon Ham, Beomhee Park, Jiho Kim, Eun K. Hong, Woonhyuk Baek, Byungseok Roh
[paper] [code]

[TMI 2023] Improving Medical Vision-Language Contrastive Pretraining with Semantics-aware Triage
Bo Liu, Donghuan Lu, Dong Wei, Xian Wu, Yan Wang, Yu Zhang, Yefeng Zheng
[paper]

[QIMS 2023] SDA-CLIP: surgical visual domain adaptation using video and text labels
Yuchong Li, Shuangfu Jia, Guangbi Song, Ping Wang, Fucang Jia
[paper] [code]

[arXiv 2023] UniBrain: Universal Brain MRI Diagnosis with Hierarchical Knowledge-enhanced Pre-training
Jiayu Lei, Lisong Dai, Haoyun Jiang, Chaoyi Wu, Xiaoman Zhang, Yao Zhang, Jiangchao Yao, Weidi Xie, Yanyong Zhang, Yuehua Li, Ya Zhang, Yanfeng Wang
[paper] [code]

[arXiv 2023] Unified Medical Image-Text-Label Contrastive Learning With Continuous Prompt
Yuhao Wang
[paper]

[arXiv 2023] Significantly Improving Zero-Shot X-ray Pathology Classification via Fine-tuning Pre-trained Image-Text Encoders
Jongseong Jang∗, Daeun Kyung∗, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae, Edward Choi
[paper]

[arXiv 2023] IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
Che Liu, Sibo Cheng, Miaojing Shi, Anand Shah, Wenjia Bai, Rossella Arcucci
[paper]

[CVPR 2024] PairAug: What Can Augmented Image-Text Pairs Do for Radiology?
Yutong Xie, Qi Chen, Sinuo Wang, Minh-Son To, Iris Lee, Ee Win Khoo, Kerolos Hendy, Daniel Koh, Yong Xia, Qi Wu
[paper]

[CVPR 2024] Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models
Weiwei Cao, Jianpeng Zhang, Yingda Xia, Tony C. W. Mok, Zi Li, Xianghua Ye, Le Lu, Jian Zheng, Yuxing Tang, Ling Zhang
[paper]

[CVPR 2024] Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework
Vu Minh Hieu Phan, Yutong Xie, Yuankai Qi, Lingqiao Liu, Liyang Liu, Bowen Zhang, Zhibin Liao, Qi Wu, Minh-Son To, Johan W. Verjans
[paper]

[MICCAI 2024] RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports
Jiawei Du, Jia Guo, Weihang Zhang, Shengzhu Yang, Hanruo Liu, Huiqi Li, Ningli Wang
[paper]

[MICCAI 2024] Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography
Shantanu Ghosh, Clare B. Poynton, Shyam Visweswaran, Kayhan Batmanghelich
[paper]

[CVPR 2024] CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
Sajid Javed, Arif Mahmood, Iyyakutti Iyappan Ganapathi, Fayaz Ali Dharejo, Naoufel Werghi, Mohammed Bennamoun
[paper]

[CAI 2024] Enhancing Biomedical Multi-modal Representation Learning with Multi-scale Pre-training and Perturbed Report Discrimination
Xinliu Zhong, Kayhan Batmanghelich, Li Sun
[paper]

[arXiv 2024] Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning
Chong Ma, Hanqi Jiang, Wenting Chen, Yiwei Li, Zihao Wu, Xiaowei Yu, Zhengliang Liu, Lei Guo, Dajiang Zhu, Tuo Zhang, Dinggang Shen, Tianming Liu, Xiang Li
[paper]

[arXiv 2024] Design as Desired: Utilizing VQA for Multimodal Pre-training
Tongkun Su, Jun Li, Xi Zhang, Haibo Jin, Hao Chen, Qiong Wang, Faqin Lv, Baoliang Zhao, Yin Hu
[paper]

[arXiv 2024] Merlin: A Vision Language Foundation Model for 3D Computed Tomography
Louis Blankemeier, Joseph Paul Cohen, Ashwin Kumar, Dave Van Veen, Syed Jamal Safdar Gardezi, Magdalini Paschali, Zhihong Chen, Jean-Benoit Delbrouck, Eduardo Reis, Cesar Truyts, Christian Bluethgen, Malte Engmann Kjeldskov Jensen, Sophie Ostmeier, Maya Varma, Jeya Maria Jose Valanarasu, Zhongnan Fang, Zepeng Huo, Zaid Nabulsi, Diego Ardila, Wei-Hung Weng, Edson Amaro Junior, Neera Ahuja, Jason Fries, Nigam H. Shah, Andrew Johnston, Robert D. Boutin, Andrew Wentland, Curtis P. Langlotz, Jason Hom, Sergios Gatidis, Akshay S. Chaudhari
[paper]

[arXiv 2024] AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis
Qiuhui Chen, Yi Hong
[paper]


Knowledge-enhanced

[ACM MM 2022] Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge
Zhihong Chen, Guanbin Li, Xiang Wan
[paper] [code]

[ICCV 2023] MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis
Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie
[paper] [code]

[MICCAI 2023] Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-Training
Xiaofei Chen, Yuting He, Cheng Xue, Rongjun Ge, Shuo Li, Guanyu Yang
[paper] [code]

[Nature Communication 2023] Knowledge-enhanced visual-language pre-training on chest radiology images
Xiaoman Zhang, Chaoyi Wu, Ya Zhang, WeidiXie & Yanfeng Wang
[paper] [code]

[npj digital medicine 2023] A medical multimodal large language model for future pandemics
Fenglin Liu, Tingting Zhu, Xian Wu, Bang Yang, Chenyu You, Chenyang Wang, Yefeng Zheng, Xu Sun, Yang Yang, Lei Clifton, David A. Clifton
[paper]

[arXiv 2023] Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining
Bingqian Lin, Zicong Chen, Mingjie Li, Haokun Lin, Hang Xu, Yi Zhu, Jianzhuang Liu, Wenjia Cai, Lei Yang, Shen Zhao, Chenfei Wu, Ling Chen, Xiaojun Chang, Yi Yang, Lei Xing, Xiaodan Liang
[paper] [code]

[arXiv 2023] A Foundation LAnguage-Image model of the Retina (FLAIR): Encoding expert knowledge in text supervision
Julio Silva-Rodriguez, Hadi Chakor, Riadh Kobbi, Jose Dolz, Ismail Ben Ayed
[paper] [code]

[arXiv 2024] MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder
Lei Li, Tianfang Zhang, Xinglin Zhang, Jiaqi Liu, Bingqi Ma, Yan Luo, Tao Chen
[paper]

[arXiv 2024] Knowledge-enhanced Visual-Language Pretraining for Computational Pathology
Xiao Zhou, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Weidi Xie, Yanfeng Wang
[paper]

[arXiv 2024] Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training
Aisha Urooj Khan, John Garrett, Tyler Bradshaw, Lonie Salkowski, Jiwoong Jason Jeong, Amara Tariq, Imon Banerjee
[paper]

[arXiv 2024] Grounded Knowledge-Enhanced Medical VLP for Chest X-Ray
Qiao Deng, Zhongzhen Huang, Yunqi Wang, Zhichuan Wang, Zhao Wang, Xiaofan Zhang, Qi Dou, Yeung Yu Hui, Edward S.Hui
[paper]

[arXiv 2024] Enhancing the vision-language foundation model with key semantic knowledge-emphasized report refinement
Cheng Li, Weijian Huang, Hao Yang, Jiarun Liu, Shanshan Wang
[paper]

[arXiv 2024] MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning
Zhe Li, Laurence T. Yang, Bocheng Ren, Xin Nie, Zhangyang Gao, Cheng Tan, Stan Z. Li
[paper]

[arXiv 2024] DeViDe: Faceted medical knowledge for improved medical vision-language pre-training
Haozhe Luo, Ziyu Zhou, Corentin Royer, Anjany Sekuboyina, Bjoern Menze
[paper]


Others

[MLHC 2022] Contrastive Learning of Medical Visual Representations from Paired Images and Text
Yuhao Zhang, Hang Jiang, Yasuhide Miura, Christopher D. Manning, Curtis P. Langlotz
[paper] [code]

[NMI 2022] Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports
Hong-Yu Zhou, Xiaoyu Chen, Yinghao Zhang, Ruibang Luo, Liansheng Wang, Yizhou Yu
[paper] [code]

[ICCV 2023] Towards Unifying Medical Vision-and-Language Pre-Training via Soft Prompts
Zhihong Chen, Benyou Wang, Shizhe Diao, Guanbin Li, Xiang Wan
[paper] [code]

[ICCV 2023] Cross-Modal Translation and Alignment for Survival Analysis
Fengtao Zhou, Hao Chen
[paper] [code]

[NeurIPS 2023] Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training by Diminishing Bias
Zhongwei Wan, Che Liu, Mi Zhang, Jie Fu, Benyou Wang, Sibo Cheng, Lei Ma, César Quilodrán-Casas, Rossella Arcucci
[paper] [code]

[MICCAI 2023] M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization
Che Liu, Sibo Cheng, Chen Chen, Mengyun Qiao, Weitong Zhang, Anand Shah, Wenjia Bai, Rossella Arcucci
[paper] [code]

[MICCAI 2023] Pathology-and-genomics Multimodal Transformer for Survival Outcome Prediction
Kexin Ding, Mu Zhou, Dimitris N. Metaxas, Shaoting Zhang
[paper] [code]

[MICCAI 2023] Surgical Video Captioning with Mutual-Modal Concept Alignment
Zhen Chen, Qingyu Guo, Leo K. T. Yeung, Danny T. M. Chan, Zhen Lei, Hongbin Liu & Jinqiao Wang
[paper] [code]

[arXiv 2023] Utilizing Synthetic Data for Medical Vision-Language Pre-training: Bypassing the Need for Real Images
Che Liu, Anand Shah, Wenjia Bai, Rossella Arcucci
[paper]

[IEEE TMI 2024] UniChest: Conquer-and-Divide Pre-training for Multi-Source Chest X-Ray Classification
Tianjie Dai, Ruipeng Zhang, Feng Hong, Jiangchao Yao, Ya Zhang,Yancheng Wang
[paper]

[ICASSP 2024] Freeze the Backbones: a Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-Training
Jiuming Qin, Che Liu, Sibo Cheng, Yike Guo, Rossella Arcucci
[paper]

[SaTML 2024] Backdoor Attack on Un-paired Medical Image-Text Pretrained Models: A Pilot Study on MedCLIP
Ruinan Jin, Chun-Yin Huang, Chenyu You, Xiaoxiao Li
[paper]

[Nature Medicine 2024] Vision–language foundation model for echocardiogram interpretation
Matthew Christensen, Milos Vukadinovic, Neal Yuan, David Ouyang
[paper]

[ICASSP 2024] Freeze the backbones: A Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-training
Jiuming Qin, Che Liu, Sibo Cheng, Yike Guo, Rossella Arcucci
[paper]

[arXiv 2024] Align as Ideal: Cross-Modal Alignment Binding for Federated Medical Vision-Language Pre-training
Zitao Shuai, Liyue Shen
[paper]

[arXiv 2024] MEDBind: Unifying Language and Multimodal Medical Data Embeddings
Yuan Gao, Sangwook Kim, David E Austin, Chris McIntosh
[paper]

[arXiv 2024] Open Challenges and Opportunities in Federated Foundation Models Towards Biomedical Healthcare
Xingyu Li, Lu Peng, Yuping Wang, Weihua Zhang
[paper]

[arXiv 2024] Medical Vision-Language Pre-Training for Brain Abnormalities
Masoud Monajatipoor, Zi-Yi Dou, Aichi Chien, Nanyun Peng, Kai-Wei Chang
[paper]

[arXiv 2024] Benchmarking PathCLIP for Pathology Image Analysis
Sunyi Zheng, Xiaonan Cui, Yuxuan Sun, Jingxiong Li, Honglin Li, Yunlong Zhang, Pingyi Chen, Xueping Jing, Zhaoxiang Ye, Lin Yang
[paper]


CLIP-driven Application

Classification

[MICCAI 2022] CLIP-Lung: Textual Knowledge-Guided Lung Nodule Malignancy Prediction
Yiming Lei, Zilong Li, Yan Shen, Junping Zhang, Hongming Shan
[paper] [code]

[ACL 2022] Language over Labels: Contrastive Language Supervision Exceeds Purely Label-Supervised Classification Performance on Chest X-Rays
Anton Wiehe, Florian Schneider, Sebastian Blank, Xintong Wang, Hans-Peter Zorn, Christian Biemann
[paper] [code]

[ICCE-Asia 2022] Transfer Learning for Medical Image Classification on Multiple Datasets using PubMedCLIP
Hong N. Dao, Tuyen Nguyen Quang, Incheon Paik
[paper]

[Nature BME 2022] Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning
Ekin Tiu, Ellie Talius, Pujan Patel, Curtis P. Langlotz, Andrew Y. Ng & Pranav Rajpurkar
[paper] [code]

[ISBI 2023] Self-Supervised Learning with Radiology Reports, A Comparative Analysis of Strategies for Large Vessel Occlusion and Brain CTA Images
S Pachade, S Datta, Y Dong, S Salazar-Marioni, R Abdelkhaleq, A Niktabe, K Roberts, SA Sheth, L Giancardo
[paper]

[ISBI 2023] Joint representation learning from french radiological reports and ultrasound images
Hind Dadoun, Hervé Delingette, Anne-Laure Rousseau, Eric de Kerviler, Nicholas Ayache
[paper]

[ISBI 2023] Multimodal Representation Learning for Blastocyst Assessment
Youcheng Wang, Zhe Zheng, Na Ni, Guoqing Tong, Nuo Cheng, Kai Li, Ping Yin, Yuanyuan Chen, Yingna Wu, Guangping Xie
[paper]

[CEUR Workshop 2023] Multi-stage Medical Image Captioning using Classification and CLIP
Masaki Aono, Hiroki Shinoda, Tetsuya Asakawa, Kazuki Shimizu, Takuya Togawa, Takuyuki Komoda
[paper]

[MIDL 2023] Improving Zero-Shot Detection of Low Prevalence Chest Pathologies using Domain Pre-trained Language Models
Yuhao Zhang, Hang Jiang, Yasuhide Miura, Christopher D. Manning, Curtis P. Langlotz
[paper] [code]

[MIDL 2023] MEDIMP: 3D Medical Images with clinical Prompts from limited tabular data for renal transplantation
Leo Milecki, Vicky Kalogeiton, Sylvain Bodard, Dany Anglicheau, Jean-Michel Correas, Marc-Olivier Timsit, Maria Vakalopoulou
[paper] [code]

[MIDL 2023] Radiology Reports Improve Visual Representations Learned from Radiographs
Haoxu Huang, Samyak Rawlekar, Sumit Chopra, Cem M Deniz
[paper] [code]

[ICCV 2023 workshop] CLIPath: Fine-tune CLIP with Visual Feature Fusion for Pathology Image Analysis Towards Minimizing Data Collection Efforts
Zhengfeng Lai, Zhuoheng Li, Luca Cerny Oliveira, Joohi Chauhan, Brittany N. Dugger, Chen-Nee Chuah
[paper]

[MICCAI 2023] Xplainer: From X-Ray Observations to Explainable Zero-Shot Diagnosis
Chantal Pellegrini, Matthias Keicher, Ege Özsoy, Petra Jiraskova, Rickmer Braren, Nassir Navab
[paper] [code]

[MICCAI 2023 workshop] Concept Bottleneck with Visual Concept Filtering for Explainable Medical Image Classification
Injae Kim, Jongha Kim, Joonmyung Choi, Hyunwoo J. Kim
[paper]

[arXiv 2022] Towards Reliable Zero Shot Classification in Self-Supervised Models with Conformal Prediction
Bhawesh Kumar, Anil Palepu, Rudraksh Tuwani, Andrew Beam
[paper]

[arXiv 2023] Domain-Controlled Prompt Learning
Qinglong Cao, Zhengqin Xu, Yuantian Chen, Chao Ma, Xiaokang Yang
[paper]

[arXiv 2023] ETP: Learning Transferable Ecg Representations Via Ecg-Text Pre-training
Che Liu, Zhongwei Wan, Sibo Cheng, Mi Zhang, Rossella Arcucci
[paper]

[arXiv 2023] A ChatGPT Aided Explainable Framework for Zero-Shot Medical Image Diagnosis
Jiaxiang Liu, Tianxiang Hu, Yan Zhang, Xiaotang Gai, Yang Feng, Zuozhu Liu
[paper]

[arXiv 2023] Are Natural Domain Foundation Models Useful for Medical Image Classification?
Joana Palés Huix, Adithya Raju Ganeshan, Johan Fredin Haslum, Magnus Söderberg, Christos Matsoukas, Kevin Smith
[paper] [code]

[arXiv 2023] Exploring Low-Resource Medical Image Classification with Weakly Supervised Prompt Learning
Fudan Zheng, Jindong Cao, Weijiang Yu, Zhiguang Chen, Nong Xiao, Yutong Lu
[paper]

[arXiv 2023] Exploring the Transfer Learning Capabilities of CLIP in Domain Generalization for Diabetic Retinopathy
Baliah, Sanoojan ; Maani, Fadillah A. ; Sanjeev, Santosh ; Haris Khan, Muhammad
[paper] [code]

[arXiv 2023] Exploring the Versatility of Zero-Shot CLIP for Interstitial Lung Disease Classification (ICLR underview)
Cara Van Uden, Christian Bluethgen, Maayane Attias, Malgorzata Polacin, Haiwei Henry Guo, Neha Simha, Rishi Raj, Curtis Langlotz
[paper]

[arXiv 2023] Few-shot medical image classification with simple shape and texture text descriptors using vision-language models
Michal Byra, Muhammad Febrian Rachmadi, Henrik Skibbe
[paper] [code]

[arXiv 2023] Fostering transparent medical image AI via an image-text foundation model grounded in medical literature
Chanwoo Kim, Soham U. Gadgil, Alex J. DeGrave, Zhuo Ran Cai, Roxana Daneshjou, Su-In Lee
[paper] [code]

[arXiv 2023] Increasing Textual Context Size Boosts Medical Image-Text Matching
Idan Glassberg, Tom Hope
[paper] [code]

[arXiv 2023] Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models
An Yan, Yu Wang, Petros Karypis, Zexue He, Chengyu Dong, Zihan Wang, Yiwu Zhong, Jingbo Shang, Amilcare Gentili, Chun-Nan Hsu, Julian McAuley
[paper] [code]

[WACV 2024] I-AI: A Controllable & Interpretable AI System for Decoding Radiologists' Intense Focus for Accurate CXR Diagnoses
Trong Thang Pham, Jacob Brecheisen, Anh Nguyen, Hien Nguyen, Ngan Le
[paper] [code]

[ISBI 2024] Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models
Cristiano Patr´ıcio, Luis F. Teixeira, Joao C. Neves
[paper] [code]

[CVPR 2024] AHIVE: Anatomy-aware Hierarchical Vision Encoding for Interactive Radiology Report Retrieval
Sixing Yan, William K. Cheung, Ivor W. Tsang, Keith Chiu, Terence M. Tong, Ka Chun Cheung, Simon See
[paper]

[IEEE Access 2024] A Multimodal Transfer Learning Approach Using PubMedCLIP for Medical Image Classification
HONG N DAO, TUYEN NGUYEN, CHERUBIN MUGISHA, INCHEON PAIK
[paper]

[IEEE TMI 2024] MCPL: Multi-modal Collaborative Prompt Learning for Medical Vision-Language Model
Pengyu Wang; Huaqi Zhang; Yixuan Yuan
[paper]

[CVPR 2024] FairCLIP: Harnessing Fairness in Vision-Language Learning
Yan Luo, Min Shi, Muhammad Osama Khan, Muhammad Muneeb Afzal, Hao Huang, Shuaihang Yuan, Yu Tian, Luo Song, Ava Kouhana, Tobias Elze, Yi Fang, Mengyu Wang
[paper]

[MICCAI 2024] MediCLIP: Adapting CLIP for Few-shot Medical Image Anomaly Detection
Ximiao Zhang, Min Xu, Dehui Qiu, Ruixin Yan, Ning Lang, and Xiuzhuang Zhou
[paper]

[PRCV 2024] Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-Label Medical Image Classification
Yaoqin Ye, Junjie Zhang, Hongwei Shi
[paper]

[MDPI 2024] MedicalCLIP: Anomaly-Detection Domain Generalization with Asymmetric Constraints
Liujie Hua , Yueyi Luo, Qianqian Qi, Jun Long
[paper]

[CSCWD 2024] A Vision-language Model Based on Prompt Learner for Few-shot Medical Images Diagnosis
Tianyou Chang, Shizhan Chen, Guodong Fan, Zhiyong Feng
[paper]

[CIBM 2024] Nodule-CLIP: Lung nodule classification based on multi-modal contrastive learning
_Lijing Sun, Mengyi Zhang, Yu Lu, Wenjun Zhu, Yang Yi, Fei Yan _
[paper]

[arXiv 2024] Light-weight Fine-tuning Method for Defending Adversarial Noise in Pre-trained Medical Vision-Language Models
Xu Han, Linghao Jin, Xuezhe Ma, Xiaofeng Liu
[paper]

[arXiv 2024] A self-supervised framework for abnormality detection from brain MRI
David Wood, Emily Guilhem, Sina Kafiabadi, Ayisha Al Busaidi, Kishan Dissanayake, Ahmed Hammam, Nina Mansoor, Matthew Townend, Siddharth Agarwal, Yiran Wei, Asif Mazumder, Gareth J Barker, Peter Sasieni, Sebastien Ourselin, James H. Cole,Thomas C. Booth
[paper]

[arXiv 2024] PM2: A New Prompting Multi-modal Model Paradigm for Few-shot Medical Image Classification
Zhenwei Wang, Qiule Sun, Bingbing Zhang, Pengfei Wang, Jianxin Zhang, Qiang Zhang
[paper]

[arXiv 2024] Robust COVID-19 Detection in CT Images with CLIP
Li Lin, Yamini Sri Krubha, Zhenhuan Yang, Cheng Ren, Thuc Duy Le, Irene Amerini, Xin Wang, Shu Hu
[paper]

[arXiv 2024] Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge Augmentation in Vision Language Model
Diwei Wang, Kun Yuan, Candice Muller, Fr´ed´eric Blanc, Nicolas Padoy, Hyewon Seo
[paper]


Dense Prediction

[MICCAI 2022] Radiological Reports Improve Pre-training for Localized Imaging Tasks on Chest X-Rays
Philip Müller, Georgios Kaissis, Congyu Zou, Daniel Rueckert
[paper]

[ASMUS 2023] Synthetic Boost: Leveraging Synthetic Data for Enhanced Vision-Language Segmentation in Echocardiography
Rabin Adhikari, Manish Dhakal, Safal Thapaliya, Kanchan Poudel, Prasiddha Bhandari & Bishesh Khanal
[paper] [code]

[ICCV 2023] CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection
Jie Liu, Yixiao Zhang, Jie-Neng Chen, Junfei Xiao, Yongyi Lu, Bennett A Landman, Yixuan Yuan, Alan Yuille, Yucheng Tang, Zongwei Zhou
[paper] [code]

[MICCAI 2023] Multiple Prompt Fusion for Zero-Shot Lesion Detection Using Vision-Language Models
Miaotian Guo, Huahui Yi, Ziyuan Qin, Haiying Wang, Aidong Men, Qicheng Lao
[paper]

[MICCAI 2023] Zero-shot Nuclei Detection via Visual-Language Pre-trained Models
Yongjian Wu, Yang Zhou, Jiya Saiyin, Bingzheng Wei, Maode Lai, Jianzhong Shou, Yubo Fan, Yan Xu
[paper] [code]

[MICCAI 2023] TCEIP: Text Condition Embedded Regression Network for Dental Implant Position Prediction
Xinquan Yang, Jinheng Xie, Xuguang Li, Xuechen Li, Xin Li, Linlin Shen, Yongqiang Deng
[paper]

[MICCAI 2023] Continual Learning for Abdominal Multi-Organ and Tumor Segmentation
Yixiao Zhang, Xinyi Li, Huimiao Chen, Alan L. Yuille, Yaoyao Liu, Zongwei Zhou
[paper] [code]

[MICCAI 2023] TPRO: Text-prompting-based Weakly Supervised Histopathology Tissue Segmentation
Shaoteng Zhang, Jianpeng Zhang, Yutong Xie, Yong Xia
[paper] [code]

[NeurIPS 2023] Text Promptable Surgical Instrument Segmentation with Vision-Language Models
Zijian Zhou, Oluwatosin Alabi, Meng Wei, Tom Vercauteren, Miaojing Shi
[paper] [code]

[arXiv 2023] Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models
Kanchan Poudel, Manish Dhakal, Prasiddha Bhandari, Rabin Adhikari, Safal Thapaliya, Bishesh Khanal
[paper] [code]

[arXiv 2023] One-shot Localization and Segmentation of Medical Images with Foundation Models
Deepa Anand, Gurunath Reddy M, Vanika Singhal, Dattesh D. Shanbhag, Shriram KS, Uday Patil, Chitresh Bhushan, Kavitha Manickam, Dawei Gui, Rakesh Mullick, Avinash Gopal, Parminder Bhatia, Taha Kass-Hout
[paper]

[ICLR 2024] AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection
Qihang Zhou, Guansong Pang, Yu Tian, Shibo He, Jiming Chen
[paper] [code]

[EMBC 2024] Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical Images
Mansi Kakkar, Dattesh Shanbhag, Chandan Aladahalli, Gurunath Reddy M
[paper]

[Medical Image Analysis 2024] Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography
Jie Liu, Yixiao Zhang, Kang Wang, Mehmet Can Yavuz, Xiaoxi Chen, Yixuan Yuan,Haoliang Li, Yang Yang, Alan Yuille, Yucheng Tang, Zongwei Zhou
[paper]

[CVPR workshop 2024] Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero-shot Medical Image Segmentation
Sidra Aleem, Fangyijie Wang, Mayug Maniparambil, Eric Arazo, Julia Dietlmeier, Kathleen Curran, Noel E. O' Connor, Suzanne Little
[paper]

[ACML 2024] Efficient Medical Images Text Detection with Vision-Language Pre-training Approach
Tianyang Li, Jinxu Bai, Qingzhu Wang, Hanwen Xu
[paper]

[MICCAI 2024] Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays
Zhichao Sun, Yuliang Gu, Yepeng Liu, Zerui Zhang, Zhou Zhao, Yongchao Xu
[paper]

[CVPR 2024] Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images
Chaoqin Huang, Aofan Jiang, Jinghao Feng, Ya Zhang, Xinchao Wang, Yanfeng Wang
[paper]

[arXiv 2024] A self-supervised text-vision framework for automated brain abnormality detection
David A. Wood, Emily Guilhem, Sina Kafiabadi, Ayisha Al Busaidi, Kishan Dissanayake, Ahmed Hammam, Nina Mansoor, Matthew Townend, Siddharth Agarwal, Yiran Wei, Asif Mazumder, Gareth J. Barker, Peter Sasieni, Sebastien Ourselin, James H. Cole, Thomas C. Booth
[paper]

[arXiv 2024] A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities
Ibrahim Ethem Hamamci, Sezgin Er, Furkan Almas, Ayse Gulnihan Simsek, Sevval Nil Esirgun, Irem Dogan, Muhammed Furkan Dasdelen, Bastian Wittmann, Enis Simsar, Mehmet Simsar, Emine Bensu Erdemir, Abdullah Alanbay, Anjany Sekuboyina, Berkan Lafci, Mehmet K. Ozdemir, Bjoern Menze
[paper]

[arXiv 2024] TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM
Wenxue Li, Xinyu Xiong, Peng Xia, Lie Ju, Zongyuan Ge
[paper]

[arXiv 2024] MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation
Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz, Yiming Xiao
[paper]

[arXiv 2024] Multimodal self-supervised learning for lesion localization
Hao Yang, Hong-Yu Zhou, Cheng Li, Weijian Huang, Jiarun Liu, Yong Liang, Shanshan Wang
[paper]

[arXiv 2024] Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation
Xiaoshuang Huang, Hongxiang Li, Meng Cao, Long Chen, Chenyu You, Dong An
[paper]

[arXiv 2024] Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports
Guangyu Guo, Jiawen Yao, Yingda Xia, Tony C.W.Mok, Zhilin Zheng, Junwei Han, Le Lu, Dingwen Zhang, Jian Zhou, Ling Zhang
[paper]


Cross-modal

[PMLH 2021] Retrieval-Based Chest X-Ray Report Generation Using a Pre-trained Contrastive Language-Image Model
Mark Endo, Rayan Krishnan, Viswesh Krishna, Andrew Y. Ng, Pranav Rajpurkar
[paper] [code]

[IPMI 2023] X-TRA: Improving Chest X-ray Tasks with Cross-Modal Retrieval Augmentation
Tom van Sonsbeek, Marcel Worring
[paper]

[ACL 2023] PubMedCLIP: How Much Does CLIP Benefit Visual Question Answering in the Medical Domain?
Sedigheh Eslami, Gerard de Melo, Christoph Meinel
[paper] [code]

[MIDL 2023] FlexR: Few-shot Classification with Language Embeddings for Structured Reporting of Chest X-rays
Matthias Keicher, Kamilia Zaripova, Tobias Czempiel, Kristina Mach, Ashkan Khakzar, Nassir Navab
[paper]

[MICCAI 2023] Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models
Tom van Sonsbeek, Mohammad Mahdi Derakhshani, Ivona Najdenkoska, Cees G. M. Snoek, and Marcel Worring
[paper] [code]

[MICCAI 2023] A Medical Semantic-Assisted Transformer for Radiographic Report Generation
Zhanyu Wang, Mingkang Tang, Lei Wang, Xiu Li, Luping Zhou
[paper]

[TETCI 2023] Parameter-Efficient Transfer Learning for Medical Visual Question Answering
Jiaxiang Liu , Tianxiang Hu, Yan Zhang, Yang Feng, Jin Hao , Junhui Lv, and Zuozhu Liu
[paper]

[AAAI 2024] CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare
Akash Ghosh*, Arkadeep Acharya*, Raghav Jain, Sriparna Saha, Aman Chadha, Setu Sinha
[paper] [code]

[arXiv 2023] PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Weixiong Lin, Ya Zhang, Yanfeng Wang, Weidi Xie
[paper] [code]

[arXiv 2024] Chest-Diffusion: A Light-Weight Text-to-Image Model for Report-to-CXR Generation
Peng Huang, Xue Gao, Lihong Huang, Jing Jiao, Xiaokang Li, Yuanyuan Wang, Yi Guo
[paper]