-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bottleneck at "utils.py:interpolate_img" with larger datasets #74
Comments
Hi Brett, |
Hi Fabian, Thanks so much for the quick response. I took your advice and changed the order for data and seg to 0. Unfortunately this step is still the major bottleneck. I recently heard of a library for pytorch monai that appears to implement some of these augmentations on the GPU side. May be the way to go since interpolation for augmentation with large datasets like these just isn't cheap. Best, Brett |
Hi Brett, |
If you want you can share a standalone dummy script of your data augmentation pipeline and I will have a look at it. Please make sure it is standalone (no funky dependencies) and can be run by itself |
Hi Fabian, Would definitely appreciate your input. We are actually adapting MIC-DKFZ's medicaldetectiontoolkit repo and using the data_loader.py on batches of 8 x 3 x 256 x 256 x 256. Each input 3x256x256x256 numpy array is float16 and 97 MB. This is our system's CPU specs: Architecture: x86_64 |
Hi, |
Hi,
|
Hi Paul, Thanks for your help as well. I'm using the same data_loader that you provide in MDTK for the LIDC dataset (numpy file input) except that our image data has 3 channels. The annotations are pixel-wise numpy binary label maps, uint8. As I mentioned in original post profiling revealed the slow down to be at the scipy interpolation step of the batchgenerator spatial transform, not ConvertSegToBoundingBoxCoordinates. I profiled in SingleThreaded mode for simplicity. See that output below the dummy script. @FabianIsensee, maybe Paul can correct my standalone dummy script of the augmentation below... we actually implement as a wrapper around an iterator through batches of training data so we don't entirely reproduce the same action. And sure enough, when I run this standalone script it's fast and does not reproduce the slow down :( Maybe the problem lies with the image data I'm using?? On visualization it looks unremarkable. Is there something other than the data type I could investigate??
|
My apologies! The standalone script above is wrong. This standalone script below does replicate the issue on my machine with dummy data. I'd greatly appreciate if you can let me know if you also experience long augmentation runtime. It appears to stem from the scipy map_coordinates function (both order 0 and 3) when profiling. See profile below as well. @pfjaeger is there a time benchmark for one batch you may be able to share for MDTK's spatial augmentation when you've run it on 3D data, or multichannel 3D data? It'd be great to know whether this is an inherent limitation of batchgenerator/scipy before moving on to something like monai. Thank you both for your input!
|
Hi @bsmarine ,
Best, |
Hi Fabian,
Thanks for testing this on your end!.
Your other suggestions on ways to decrease batch generation time are also
very much appreciated.
The desired output is indeed 256. The entire patient scan is three phases,
768x512x512 but for now inputting just the relevant 256^3 block as it would
take way too long at full size.
Also testing out smaller batches (8->4) and only using Mirror Transform at
the moment. Will see how this goes.
Btw, congrats on the MICCAI COVID Challenge, see your name near the top of
the leaderboard. Best of luck!
Kind regards,
Brett
…On Tue, Dec 22, 2020 at 3:06 AM Fabian Isensee ***@***.***> wrote:
Hi @bsmarine <https://github.com/bsmarine> ,
I finally found the time to look into this today. From my perspective,
everything looks fine. I could replicate the long run time you reported
(179s for me), but that is completely normal for this size of input.
Unfortunately, speeding this up is beyond our control: We merely generate a
coordinate grid that is scaled and rotated and then let scipy do the
interpolation. This interpolation is implemented in C code in their backend
(built-in method scipy.ndimage._nd_image.geometric_transform) and I would
presume that they know what they are doing coding-wise :-)
Here are a couple of things you can consider to speed up the calculations:
1. reduce the order of interpolation. If you set this to 0, it's doing
nearest neighbor which is a lot faster. 1 is linear
order 0: 28
order 1: 49
order 2: 179
order 3: 253
1.
What is the patch size your model is actually trained with? If this is
not 256x256x256, make sure to tell SpatialTransform the actual final patch
size. If I replace 256x256x256 with 128x128x128 in the spatialtransform
then the run time is reduced to 96s (from 179). Note that the output size
is then of course 128, not 256.
2.
SpatialTransform has parameters p_rot_per_sample and
p_scale_per_sample which default to 1. This means that it will apply these
augmentations to all the patches. I have confirmed experimentally
(segmentation) that this is not necessarily ideal: you want on the one hand
diversity and and the same time not mess with the data distribution too
much. Therefore I would recommend setting these to lower values. 0.3 works
well for me. What this results in is that only 1 - (1 - 0.3) * (1 - 0.3) =
51% of the patches will be augmented. This would cut your CPU time in half.
You can even go lower than that.
3.
I presume you are doing that already but using multithreaded
augmentation really goes a long way. Use as many CPUs for this as you can.
Best,
Fabian
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#74 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE7X5GIDEOO5N4MF5TNBD4LSWBHPDANCNFSM4UNZ7KVQ>
.
|
Hi, |
Hi,
Thanks for sharing and maintaining batchgenerators it's been very useful for our work!
We are trying to scale up training larger datasets but see a considerable bottleneck with interpolate_img in the spatial_transform. It is taking nearly 11 seconds for each 3D image per channel per batch sample.
input batch data shape = 8 x 3 x 512 x 512 x 768 (float 32)
SpatialTransform(patch_size=[256, 256, 256], patch_center_dist_from_border= (125.0, 125.0), do_elastic_deform=False, alpha=(0.0, 1500.0), sigma=(30.0, 50.0), do_rotation=True, angle_x= (0, 0.0),angle_y=(0, 0.0),angle_z=(0.0, 6.283185307179586), do_scale=True, scale=(0.8, 1.1),random_crop=False)
When profiling it looks like this stems from the scipy spline filter1d function.
Have you or anyone else encountered this with larger datasets? Any suggestions on how to speed up or work around?
Thanks,
Brett
The text was updated successfully, but these errors were encountered: