Large-Scale 3D Medical Image Pre-training

This work presents VoCo, a new method for Large-Scale 3D Medical Image Pre-training. We release a new benchmark, including 160K volumes (42M slices) for pre-training, 31M~1.2B params of pre-trained models, various pre-training recipes, and 50+ downstream tasks implementation.

Linshan Wu, Jiaxin Zhuang, and Hao Chen. "Large-Scale 3D Medical Image Pre-training with Geometric Context Priors". CVPR 2024 Extension.

Quick Start

Models: 31M~1.2B params of pre-trained models.
Downstream: 50+ tasks implementations (segmentation, classification, registration, vision-language).
Datasets:
- PreCT-160K: The existing largest dataset in this field: 160K CT volumes (42M slices)
- VoComni: 20K volumes with pseudo labels (20 organ & tumor classes)
- VoCovid: Semi-supervised covid segmentation
Pre-training:
- Fully-supervised: Pre-training with labeled data
- Self-supervised: Pre-training with unlabeled data
- Semi-supervised: Pre-training with labeled and unlabeled data
- Omni-supervised: Pre-training with labeled and unlabeled data
CVPR version
中文解读
公众号

Pre-trained Models

We provide various models for downstream tasks. For nnUNet, please refer to nnunet trainer.

'SSL_head' represents trained by Self-supervised pre-training.
'Omni' represents trained by Omni-supervised pre-training.

Model	Params	Checkpoint
VoComni_nnunet	31M	Download
VoCo_B_SSL_head	53M	Download
VoCo_L_SSL_head	206M	Download
VoCo_H_SSL_head	818M	Download
VoComni_B	72M	Download
VoComni_L	290M	Download
VoComni_H	1.2B	Download

We download checkpoints of previous methods from SuPreM for comparison (Thanks for their great efforts!).

Summary: We spent over 10,000 GPU hours in evaluating 50+ downstream tasks. SuPreM appears to be the best in previous methods. You can try these models in Downstream.

The path of pre-trained models should be organized as:

├── YOUR/DIRECTORY/OF/PRETRAINED/MODELS
    ├── VoComni_nnunet.pt
    ├── VoCo_B_SSL_head.pt
    ├── VoCo_L_SSL_head.pt
    ├── VoCo_H_SSL_head.pt
    ├── VoComni_B.pt
    ├── VoComni_L.pt
    ├── VoComni_H.pt
    ├── supervised_dodnet_unet_920.pth
    ├── supervised_clip_driven_universal_swin_unetr_2100.pth
    ├── self_supervised_unimiss_nnunet_small_5022.pth
    ├── self_supervised_nv_swin_unetr_5050.pt
    ├── self_supervised_models_genesis_unet_620.pt
    └── supervised_suprem_swinunetr_2100.pth

Load Pre-trained models

import torch
import argparse
from monai.networks.nets import SwinUNETR
def load(model, model_dict):
    # make sure you load our checkpoints
    if "state_dict" in model_dict.keys():
        state_dict = model_dict["state_dict"]
    else:
        state_dict = model_dict
    current_model_dict = model.state_dict()
    for k in current_model_dict.keys():
        if (k in state_dict.keys()) and (state_dict[k].size() == current_model_dict[k].size()):
            print(k)
    new_state_dict = {
        k: state_dict[k] if (k in state_dict.keys()) and (state_dict[k].size() == current_model_dict[k].size()) else current_model_dict[k]
        for k in current_model_dict.keys()}
    model.load_state_dict(new_state_dict, strict=True)
    return model
parser = argparse.ArgumentParser(description="VoCo models")
parser.add_argument("--feature_size", default=48, type=int,
                    help="feature size: 48 Base (B), 96 Large (L), 192 Huge (H)")
parser.add_argument("--in_channels", default=1, type=int, help="number of input channels")
parser.add_argument("--out_channels", default=21, type=int, help="number of output channels")
parser.add_argument("--roi_x", default=96, type=int, help="roi size in x direction")
parser.add_argument("--roi_y", default=96, type=int, help="roi size in y direction")
parser.add_argument("--roi_z", default=96, type=int, help="roi size in z direction")
args = parser.parse_args()
model = SwinUNETR(img_size=(args.roi_x, args.roi_y, args.roi_z),
        in_channels=args.in_channels,
        out_channels=args.out_channels,
        feature_size=args.feature_size,
        use_v2=True)
# YOUR PATH OF PRETRAINED MODELS. MODIFY IT
pretrained_path = './pretrained/VoComni_B.pt'
model_dict = torch.load(pretrained_path, map_location=torch.device('cpu'))
model = load(model, model_dict)

NOTE: "roi" is flexible according to your own settings. Your need to adjust "in_channels" and "out_channels" for specific datasets. If "in_channels != 1" or "out_channels != 21", only the first layer or the last layer would not be loaded.

Fine-tuning

Installation

git clone https://github.com/Luffy03/Large-Scale-Medical
cd Large-Scale-Medical
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

Download Downstream Datasets

Please refer to Acknowledgment. Download our pre-processed downstream datasets for downstream tasks.

Implementations

Please refer to Downstream: 50+ downstream tasks implementations.

We are uploading our fine-tuning checkpoints to BaiduYun to make sure fair comparisons.

Pre-training

Download Pre-training Dataset

Please refer to Acknowledgment. Download our PreCT-160K for pre-training.

WARNING:

It requires 22.6 TB space to store the original datasets. For pre-training, it requires extra 30 TB space to cache the data, otherwise the pre-training will be very slow. And please store them in SSD.
If you do not have enough space for PreCT-160K, you can try our VoComni dataset. It requires less than 10 TB only (?).

Various Pre-training recipes

Please refer to:

VoComni

To facilitate the following research, we use VoCo to generate pseudo labels on 20K volumes, with 20 organ and tumor classes. Please refer to VoComni.

VoCovid

Please refer to VoCovid for Semi-supervised Covid Segmentation. Dataset can be downloaded from hugging face.

Acknowledgement

NOTE THAT we are not the authors of these datasets. Although all these datasets are publicly available for academic research, you need to cite the original works as shown in our paper. For certain datasets (e.g., WORD) that necessitate approval from the authors, you need to download it from the original link.

Citation

If you find this repo useful for your research, please consider citing the paper as follows:

@article{wu2024large,
  title={Large-Scale 3D Medical Image Pre-training with Geometric Context Priors},
  author={Wu, Linshan and Zhuang, Jiaxin and Chen, Hao},
  journal={arXiv preprint arXiv:2410.09890},
  year={2024}
}
@InProceedings{voco-v1,
    author    = {Wu, Linshan and Zhuang, Jiaxin and Chen, Hao},
    title     = {VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis},
    booktitle = {CVPR},
    month     = {June},
    year      = {2024},
    pages     = {22873-22882}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large-Scale 3D Medical Image Pre-training

Quick Start

Pre-trained Models

Load Pre-trained models

Fine-tuning

Installation

Download Downstream Datasets

Implementations

Pre-training

Download Pre-training Dataset

Various Pre-training recipes

VoComni

VoCovid

Acknowledgement

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 181 Commits
Downstream		Downstream
Omni-supervised		Omni-supervised
Self-supervised		Self-supervised
Semi-supervised		Semi-supervised
VoComni		VoComni
assets		assets
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

Luffy03/Large-Scale-Medical

Folders and files

Latest commit

History

Repository files navigation

Large-Scale 3D Medical Image Pre-training

Quick Start

Pre-trained Models

Load Pre-trained models

Fine-tuning

Installation

Download Downstream Datasets

Implementations

Pre-training

Download Pre-training Dataset

Various Pre-training recipes

VoComni

VoCovid

Acknowledgement

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages