version 2.0 - Jul 31 2018 - by Qihang Yu, Yuyin Zhou and Lingxi Xie
(1) slightly changed network architecture (score layers are removed and change to a saliency layer), so that network training is more robust (especially on some tiny targets such as pancreatic cysts);
(2) carefully optimized codes so that the testing stage becomes much more efficient, especially when you use multiple processes to run different folds or datasets;
(3) re-implemented two functions "post-processing" and "DSC_computation" in C, which is much faster.
Note that our pre-trained models are also updated.
2. If you are more familiar with PyTorch, take a look at this repository!
Yuyin Zhou implemented the original coarse-to-fine framework, Qihang Yu improved it to allow end-to-end training, and Lingxi Xie later wrapped up these codes for release.
Qihang Yu, Lingxi Xie, Yan Wang, Yuyin Zhou, Elliot K. Fishman, Alan L. Yuille, "Recurrent Saliency Transformation Network: Incorporating Multi-Stage Visual Cues for Small Organ Segmentation", in IEEE Conference on CVPR, Salt Lake City, Utah, USA, 2018.
https://arxiv.org/abs/1709.04518
Yuyin Zhou, Lingxi Xie, Wei Shen, Yan Wang, Elliot K. Fishman, Alan L. Yuille, "A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans", in International Conference on MICCAI, Quebec City, Quebec, Canada, 2017.
https://arxiv.org/abs/1612.08230
All the materials released in this library can ONLY be used for RESEARCH purposes.
The authors and their institution (JHU/JHMI) preserve the copyright and all legal rights of these codes.
Before you start, please note that there is a LAZY MODE, which allows you to run the entire framework with ONE click. Check the contents before Section 4.3 for details.
OrganSegRSTN is a code package for our paper:
Qihang Yu, Lingxi Xie, Yan Wang, Yuyin Zhou, Elliot K. Fishman, Alan L. Yuille, "Recurrent Saliency Transformation Network: Incorporating Multi-Stage Visual Cues for Small Organ Segmentation", in IEEE Conference on CVPR, Salt Lake City, Utah, USA, 2018.
OrganSegRSTN is a segmentation framework designed for 3D volumes. It was originally designed for segmenting abdominal organs in CT scans, but we believe that it can also be used for other purposes, such as brain tissue segmentation in fMRI-scanned images.
OrganSegRSTN is based on the state-of-the-art deep learning techniques. This code package is to be used with CAFFE, a deep learning library. We make use of the python interface of CAFFE, named pyCAFFE.
It is highly recommended to use one or more modern GPUs for computation. Using CPUs will take at least 50x more time in computation.
We provide an easy implementation in which the training stages has only 1 fine-scaled iteration. If you hope to add more, please modify the prototxt file accordingly. As we said in the paper, our strategy of using 1 stage in training and multiple iterations in testing works very well.
Folder/File | Description |
---|---|
README.md |
the README file |
DATA2NPY/ | codes to transfer the NIH dataset into NPY format |
dicom2npy.py |
transferring image data (DICOM) into NPY format |
nii2npy.py |
transferring label data (NII) into NPY format |
DiceLossLayer/ | CPU implementation of the Dice loss layer |
dice_loss_layer.hpp |
the header file |
dice_loss_layer.cpp |
the CPU implementation |
OrganSegRSTN/ | primary codes of OrganSegRSTN |
coarse2fine_testing.py |
the coarse-to-fine testing process |
coarse_fusion.py |
the coarse-scaled fusion process |
coarse_testing.py |
the coarse-scaled testing process |
Crop.py |
the crop layer (cropping a region from the image) |
Data.py |
the data layer |
indiv_training.py |
training the coarse and fine stages individually |
init.py |
the initialization functions |
joint_training.py |
training the coarse and fine stages jointly |
Uncrop.py |
the uncrop layer (putting the regional output back) |
oracle_fusion.py |
the fusion process with oracle information |
oracle_testing.py |
the testing process with oracle information |
run.sh |
the main program to be called in bash shell |
surgery.py |
the surgery function |
utils.py |
the common functions |
OrganSegRSTN/prototxts/ | prototxt files of OrganSegRSTN |
deploy_C3.prototxt |
the prototxt file for coarse-scaled testing |
deploy_F3.prototxt |
the prototxt file for fine-scaled testing |
deploy_O3.prototxt |
the prototxt file for oracle testing |
training_I3x1.prototxt |
the prototxt file for individual training (1xLR) |
training_I3x10.prototxt |
the prototxt file for individual training (10xLR) |
training_J3x1.prototxt |
the prototxt file for joint training (1xLR) |
training_J3x10.prototxt |
the prototxt file for joint training (10xLR) |
training_S3x1.prototxt |
the prototxt file for separate training (1xLR) |
training_S3x10.prototxt |
the prototxt file for separate training (10xLR) |
logs/ | training log files on the NIH dataset |
The multiplier (1 or 10) applies to all the trainable layers in the fine stage of the framework.
Without them, you will need 50x more time in both training and testing stages.
3.2.1 Download a CAFFE library from http://caffe.berkeleyvision.org/ .
Suppose your CAFFE root directory is $CAFFE_PATH.
dice_loss_layer.hpp -> $CAFFE_PATH/include/caffe/layers/
dice_loss_layer.cpp -> $CAFFE_PATH/src/caffe/layers/
Please follow these steps to reproduce our results on the NIH pancreas segmentation dataset.
NOTE: Here we only provide basic steps to run our codes on the NIH dataset. For more detailed analysis and empirical guidelines for parameter setting (this is very important especially when you are using our codes on other datasets), please refer to our technical report (check our webpage for updates).
4.1.1 Download NIH data from https://wiki.cancerimagingarchive.net/display/Public/Pancreas-CT .
You should be able to download image and label data individually.
Suppose your data directory is $RAW_PATH:
The image data are organized as $RAW_PATH/DOI/PANCREAS_00XX/A_LONG_CODE/A_LONG_CODE/ .
The label data are organized as $RAW_PATH/TCIA_pancreas_labels-TIMESTAMP/label00XX.nii.gz .
Put dicom2npy.py under $RAW_PATH, and run: python dicom2npy.py .
The transferred data should be put under $RAW_PATH/images/
Put nii2npy.py under $RAW_PATH, and run: python nii2npy.py .
The transferred data should be put under $RAW_PATH/labels/
Put $CAFFE_PATH under $DATA_PATH/libs/
Put images/ under $DATA_PATH/
Put labels/ under $DATA_PATH/
Download the scratch model below and put it under $DATA_PATH/models/pretrained/
The scratch model - see the explanations in 4.2.3.
NOTE: If you use other path(s), please modify the variable(s) in run.sh accordingly.
Several folders will be created under $DATA_PATH:
$DATA_PATH/images_X|Y|Z/: the sliced image data (data are sliced for faster I/O).
$DATA_PATH/labels_X|Y|Z/: the sliced label data (data are sliced for faster I/O).
$DATA_PATH/lists/: used for storing training, testing and slice lists.
$DATA_PATH/logs/: used for storing log files during the training process.
$DATA_PATH/models/: used for storing models (snapshots) during the training process.
$DATA_PATH/prototxts/: used for storing prototxts (called by training and testing nets).
$DATA_PATH/results/: used for storing testing results (volumes and text results).
According to the I/O speed of your hard drive, the time cost may vary.
For a typical HDD, around 20 seconds are required for a 512x512x300 volume.
This process needs to be executed only once.
NOTE: if you are using another dataset which contains multiple targets,
you can modify the variables "ORGAN_NUMBER" and "ORGAN_ID" in run.sh,
as well as the "is_organ" function in utils.py to define your mapping function flexibly.
You can run all the following modules with one execution!
- a) Enable everything (except initialization) in the beginning part.
- b) Set all the "PLANE" variables as "A" (4 in total) in the following part.
- c) Run this manuscript!
You need to run X|Y|Z planes individually, so you can use 3 GPUs in parallel.
You can also set INDIV_TRAINING_PLANE=A, so that three planes are trained orderly in one GPU.
The following folders/files will be created:
Under $DATA_PATH/logs/, a log file named by training information.
Under $DATA_PATH/models/snapshots/, a folder named by training information.
Snapshots and solver-states will be stored in this folder.
The log file will also be copied into this folder after the entire training process.
On the axial view (training image size is 512x512, small input images make training faster),
each 20 iterations cost ~10s on a Titan-X Pascal GPU, or ~8s on a Titan-Xp GPU.
As described in the code, we need ~40K iterations, which take less than 5 GPU-hours.
After the training process, the log file will be copied to the snapshot directory.
It is very important to provide a reasonable initialization for our model. In the previous step of data preparation, we provide a scratch model for the NIH dataset, in which both the coarse and fine stages are initialized using the weights of an FCN-8s model (please refer to the FCN project). This model was pre-trained on PASCALVOC. We initialized all upsampling weights to be 0, as the number of channels does not align with that in PASCAL.
The most important thing is to initialize three layers related to saliency transformation, which are named "score", "score-R" and "saliency" in our prototxts. In our solution, we use a Xavier filler to fill in the weights of the "score" and "score-R" layers, and an all-0 cube to fill in the weights of the "saliency" layer. For the bias term, we use an all-0 vector for "score" and "score-R", and an all-1 vector for "saliency". We also set a restart mechanism after the first 10K iterations in case of non-convergece. In more than 95% of time, this mechanism leads to a successful convergence.
The loss function value in the beginning of training is almost 1.0.
If a model converges, you should observe the loss function values to decrease gradually.
But in order to make it work well, in the last several epochs,
you need to confirm the average loss function value to be sufficiently low (e.g. 0.15).
Here we attach the training logs for your reference, see the logs/
folder (detailed in Section 5).
If you are experimenting on other CT datasets, we strongly recommend you to use a pre-trained model, such as those pre-trained model attached in the last part of this file. We also provide a mixed model (to be provided soon), which was tuned using all X|Y|Z images of 82 training samples for pancreas segmentation on NIH. Of course, do not use it to evaluate any NIH data, as all cases have been used for training.
You need to run X|Y|Z planes individually, so you can use 3 GPUs in parallel.
You can also set JOINT_TRAINING_PLANE=A, so that three planes are trained orderly in one GPU.
The following folders/files will be created:
Under $DATA_PATH/logs/, a log file named by training information.
Under $DATA_PATH/models/snapshots/, a folder named by training information.
Snapshots and solver-states will be stored in this folder.
The log file will also be copied into this folder after the entire training process.
On the axial view (training image size is 512x512, small input images make training faster),
each 20 iterations cost ~10s on a Titan-X Pascal GPU, or ~8s on a Titan-Xp GPU.
As described in the paper, we need ~40K iterations, which take less than 5 GPU-hours.
After the training process, the log file will be copied to the snapshot directory.
You need to run X|Y|Z planes individually, so you can use 3 GPUs in parallel.
You can also set COARSE_TESTING_PLANE=A, so that three planes are tested orderly in one GPU.
The following folder will be created:
Under $DATA_PATH/results/, a folder named by training information.
Testing each volume costs ~30 seconds on a Titan-X Pascal GPU, or ~25s on a Titan-Xp GPU.
The following folder will be created:
Under $DATA_PATH/results/, a folder named by fusion information.
The main cost in fusion includes I/O and post-processing (removing non-maximum components).
In our future release, we will implement post-processing in C for acceleration.
NOTE: Without this step, you can also run the coarse-to-fine testing process. This stage is still recommended, so that you can check the quality of the fine-scaled models.
You need to run X|Y|Z planes individually, so you can use 3 GPUs in parallel.
You can also set ORACLE_TESTING_PLANE=A, so that three planes are tested orderly in one GPU.
The following folder will be created:
Under $DATA_PATH/results/, a folder named by training information.
Testing each volume costs ~10 seconds on a Titan-X Pascal GPU, or ~8s on a Titan-Xp GPU.
NOTE: Without this step, you can also run the coarse-to-fine testing process. This stage is still recommended, so that you can check the quality of the fine-scaled models.
The following folder will be created:
Under $DATA_PATH/results/, a folder named by fusion information.
The main cost in fusion includes I/O and post-processing (removing non-maximum components).
In our future release, we will implement post-processing in C for acceleration.
Fusion is performed on CPU and all X|Y|Z planes are combined.
Currently X|Y|Z testing processes are executed with one GPU, but it is not time-comsuming.
The following folder will be created:
Under $DATA_PATH/results/, a folder named by coarse-to-fine information (very long).
This function calls both fine-scaled testing and fusion codes, so both GPU and CPU are used.
In our future release, we will implement post-processing in C for acceleration.
NOTE: currently we set the maximal rounds of iteration to be 10 in order to observe the convergence. Most often, it reaches an inter-DSC of >99% after 3-5 iterations. If you hope to save time, you can slight modify the codes in coarse2fine_testing.py. Testing each volume costs ~40 seconds on a Titan-X Pascal GPU, or ~32s on a Titan-Xp GPU. If you set the threshold to be 99%, this stage will be done within 2 minutes (in average).
Congratulations! You have finished the entire process. Check your results now!
NOTE: all these models were trained following our default settings.
The 82 cases in the NIH dataset are split into 4 folds:
- Fold #0: testing on Cases 01, 02, ..., 20;
- Fold #1: testing on Cases 21, 22, ..., 40;
- Fold #2: testing on Cases 41, 42, ..., 61;
- Fold #3: testing on Cases 62, 63, ..., 82.
We provide the individually-trained models on each plane of each fold, in total 12 files.
Each of these models is around 1.03GB, approximately the size of two (coarse+fine) FCN models.
- Fold #0: [X] [Y] [Z] (Accuracy: coarse-to-fine 84.44%)
- Fold #1: [X] [Y] [Z] (Accuracy: coarse-to-fine 84.35%)
- Fold #2: [X] [Y] [Z] (Accuracy: coarse-to-fine 84.12%)
- Fold #3: [X] [Y] [Z] (Accuracy: coarse-to-fine 85.43%)
- Average accuracy over 82 cases: 84.59%.
We ran our codes several times, and the average accuracy varies between 84.4% and 84.6%.
If you encounter any problems in downloading these files, please contact Lingxi Xie ([email protected]).
We also attach the log files for your reference here. Please refer to the logs/
folder.
The current version is v2.0.
If you encounter any problems in using these codes, please open an issue in this repository. You may also contact Qihang Yu ([email protected]) or Lingxi Xie ([email protected]).
Thanks for your interest! Have fun!