Skip to content

[CVPR 2024 Extension] 160K volumes (42M slices) datasets, new segmentation datasets, 31M-1.2B pre-trained models, various pre-training recipes, 50+ downstream tasks implementation

License

Notifications You must be signed in to change notification settings

Luffy03/Large-Scale-Medical

Repository files navigation

Large-Scale 3D Medical Image Pre-training

Paper PDF Paper PDF Dataset Dataset Dataset

This work presents VoCo, a new method for Large-Scale 3D Medical Image Pre-training. We release a new benchmark, including 160K volumes (42M slices) for pre-training, 31M~1.2B params of pre-trained models, various pre-training recipes, and 50+ downstream tasks implementation.

Linshan Wu, Jiaxin Zhuang, and Hao Chen. "Large-Scale 3D Medical Image Pre-training with Geometric Context Priors". CVPR 2024 Extension.

teaser

Quick Start

Pre-trained Models

We provide various models for downstream tasks. For nnUNet, please refer to nnunet trainer.

Model Params Checkpoint
VoComni_nnunet 31M Download
VoCo_B_SSL_head 53M Download
VoCo_L_SSL_head 206M Download
VoCo_H_SSL_head 818M Download
VoComni_B 72M Download
VoComni_L 290M Download
VoComni_H 1.2B Download

We download checkpoints of previous methods from SuPreM for comparison (Thanks for their great efforts!).

Summary: We spent over 10,000 GPU hours in evaluating 50+ downstream tasks. SuPreM appears to be the best in previous methods. You can try these models in Downstream.

The path of pre-trained models should be organized as:

├── YOUR/DIRECTORY/OF/PRETRAINED/MODELS
    ├── VoComni_nnunet.pt
    ├── VoCo_B_SSL_head.pt
    ├── VoCo_L_SSL_head.pt
    ├── VoCo_H_SSL_head.pt
    ├── VoComni_B.pt
    ├── VoComni_L.pt
    ├── VoComni_H.pt
    ├── supervised_dodnet_unet_920.pth
    ├── supervised_clip_driven_universal_swin_unetr_2100.pth
    ├── self_supervised_unimiss_nnunet_small_5022.pth
    ├── self_supervised_nv_swin_unetr_5050.pt
    ├── self_supervised_models_genesis_unet_620.pt
    └── supervised_suprem_swinunetr_2100.pth

Load Pre-trained models

import torch
import argparse
from monai.networks.nets import SwinUNETR
def load(model, model_dict):
    # make sure you load our checkpoints
    if "state_dict" in model_dict.keys():
        state_dict = model_dict["state_dict"]
    else:
        state_dict = model_dict
    current_model_dict = model.state_dict()
    for k in current_model_dict.keys():
        if (k in state_dict.keys()) and (state_dict[k].size() == current_model_dict[k].size()):
            print(k)
    new_state_dict = {
        k: state_dict[k] if (k in state_dict.keys()) and (state_dict[k].size() == current_model_dict[k].size()) else current_model_dict[k]
        for k in current_model_dict.keys()}
    model.load_state_dict(new_state_dict, strict=True)
    return model
parser = argparse.ArgumentParser(description="VoCo models")
parser.add_argument("--feature_size", default=48, type=int,
                    help="feature size: 48 Base (B), 96 Large (L), 192 Huge (H)")
parser.add_argument("--in_channels", default=1, type=int, help="number of input channels")
parser.add_argument("--out_channels", default=21, type=int, help="number of output channels")
parser.add_argument("--roi_x", default=96, type=int, help="roi size in x direction")
parser.add_argument("--roi_y", default=96, type=int, help="roi size in y direction")
parser.add_argument("--roi_z", default=96, type=int, help="roi size in z direction")
args = parser.parse_args()
model = SwinUNETR(img_size=(args.roi_x, args.roi_y, args.roi_z),
        in_channels=args.in_channels,
        out_channels=args.out_channels,
        feature_size=args.feature_size,
        use_v2=True)
# YOUR PATH OF PRETRAINED MODELS. MODIFY IT
pretrained_path = './pretrained/VoComni_B.pt'
model_dict = torch.load(pretrained_path, map_location=torch.device('cpu'))
model = load(model, model_dict)

NOTE: "roi" is flexible according to your own settings. Your need to adjust "in_channels" and "out_channels" for specific datasets. If "in_channels != 1" or "out_channels != 21", only the first layer or the last layer would not be loaded.

Fine-tuning

Installation

git clone https://github.com/Luffy03/Large-Scale-Medical
cd Large-Scale-Medical
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

Download Downstream Datasets

Please refer to Acknowledgment. Download our pre-processed downstream datasets for downstream tasks.

Implementations

Please refer to Downstream: 50+ downstream tasks implementations.

We are uploading our fine-tuning checkpoints to BaiduYun to make sure fair comparisons.

Pre-training

Download Pre-training Dataset

Please refer to Acknowledgment. Download our PreCT-160K for pre-training.

WARNING:

  • It requires 22.6 TB space to store the original datasets. For pre-training, it requires extra 30 TB space to cache the data, otherwise the pre-training will be very slow. And please store them in SSD.
  • If you do not have enough space for PreCT-160K, you can try our VoComni dataset. It requires less than 10 TB only (?).

Various Pre-training recipes

Please refer to:

VoComni

To facilitate the following research, we use VoCo to generate pseudo labels on 20K volumes, with 20 organ and tumor classes. Please refer to VoComni.

VoCovid

Please refer to VoCovid for Semi-supervised Covid Segmentation. Dataset can be downloaded from hugging face.

Acknowledgement

NOTE THAT we are not the authors of these datasets. Although all these datasets are publicly available for academic research, you need to cite the original works as shown in our paper. For certain datasets (e.g., WORD) that necessitate approval from the authors, you need to download it from the original link.

Citation

If you find this repo useful for your research, please consider citing the paper as follows:

@article{wu2024large,
  title={Large-Scale 3D Medical Image Pre-training with Geometric Context Priors},
  author={Wu, Linshan and Zhuang, Jiaxin and Chen, Hao},
  journal={arXiv preprint arXiv:2410.09890},
  year={2024}
}
@InProceedings{voco-v1,
    author    = {Wu, Linshan and Zhuang, Jiaxin and Chen, Hao},
    title     = {VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis},
    booktitle = {CVPR},
    month     = {June},
    year      = {2024},
    pages     = {22873-22882}
}

About

[CVPR 2024 Extension] 160K volumes (42M slices) datasets, new segmentation datasets, 31M-1.2B pre-trained models, various pre-training recipes, 50+ downstream tasks implementation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published