Semi-supervised learning is the simplest way to unleash the power of labeled and unlabeled data simultaneously. Here, we provide a simple baseline for implementing large-scale semi-supervised pre-training.
If you don't want to train from scratch, you can use our pre-trained models.
Model | Params | Checkpoint |
---|---|---|
VoComni_nnunet | 31M | Download |
VoCo_B_SSL_head | 53M | Download |
VoCo_L_SSL_head | 206M | Download |
VoCo_H_SSL_head | 818M | Download |
VoComni_B | 72M | Download |
VoComni_L | 290M | Download |
VoComni_H | 1.2B | Download |
You can download our VoComni and assign them as labeled sets. For unlabeled sets, you can aggregate different sources of datasets into "imagesUn". It should be with same classes as VoComni.json, or you can define by yourself. Here, we only provide a baseline for training.
The path should be organized as:
├── Data
├── imagesTr
├── labelsTr
└── imagesUn
Use gen_json.py to obtain "dataset_unlabeled.json".
cd Semi-supervised
source activate YOUR-CONDA-ENVIRONMENT
# single GPU, if you don't have enough gpu resource
sh single_train
# multi-gpu
sh dist_B.sh
sh dist_L.sh
sh dist_H.sh
If you find this repo useful for your research, please consider citing the paper as follows:
@article{wu2024large,
title={Large-Scale 3D Medical Image Pre-training with Geometric Context Priors},
author={Wu, Linshan and Zhuang, Jiaxin and Chen, Hao},
journal={arXiv preprint arXiv:2410.09890},
year={2024}
}
@InProceedings{voco-v1,
author = {Wu, Linshan and Zhuang, Jiaxin and Chen, Hao},
title = {VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis},
booktitle = {CVPR},
month = {June},
year = {2024},
pages = {22873-22882}
}