-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sigma clipping is painfully slow #624
Comments
About 4.5 minutes, but it feels longer.. |
I also noticed that it's very slow. Thanks for bringing this up (I always forgot). Could you provide a random-dataset for profiling? I could be interesting depending on the data-types and the strides of the dataset (the actual data is probably not important except it contains lots of unusual values like Also because we're re-using astropys sigma-clipping it would be handy to know what astropy version you were using. One additional thing that could be a problem is RAM. Depending on the RAM and size of the arrays there may be lots of swapping going on. Sorry for asking so much information from you but without having the specifics it's probably hard to profile it (correctly). |
This really brings up the issue that we need to add more benchmarks to |
Sure; I can also provide the data set I was working with if you want. It is just a bunch of dark frames. It is conceivable the bottle neck is in the combining, not the clipping, I suppose. |
Probably related, @larrybradley is working on speeding up the astropy one: astropy/astropy#7478 |
Very related, thanks for mentioning that, @SaOgaz. And, of course, thanks to @larrybradley for working on it! |
I haven't had a time to do much profiling, but here is an example of a set of calls that takes several minutes to run (for the combine part). Adding in some outlying pixels seems to help drive up the run time ( about 30 secs faster without that). There are a number of things here that can lead to a speedup, I think. Even without sigma clipping, it takes close to a minute to combine the images. Away from keyboard for a week, but will take another look when I'm back... import numpy as np
from astropy.io import fits
from ccdproc import combine
from astropy.stats import median_absolute_deviation
size = [4096, 4109]
n_images = 25
base_name = 'test-combine-{num:03d}.fits'
for num in range(n_images):
data = np.random.normal(size=size)
# Now add some outlying pixels so there is something to clip
n_bad = 50000
bad_x = np.random.randint(0, high=size[0] - 1, size=n_bad)
bad_y = np.random.randint(0, high=size[1] - 1, size=n_bad)
data[bad_x, bad_y] = np.random.choice([-1, 1], size=n_bad) * (10 + np.random.rand(n_bad))
hdu = fits.PrimaryHDU(data=data)
hdu.header['for_prof'] = 'yes'
hdu.header['bunit'] = 'adu'
hdu.writeto(base_name.format(num=num), overwrite=True)
ic = ImageFileCollection('.', '*')
fils = ic.files_filtered(for_prof='yes')
combine(fils, output_file='foo12.fit', sigma_clip=True, sigma_clip_low_thresh=5, sigma_clip_high_thresh=5, sigma_clip_func=np.ma.median, sigma_clip_dev_func=median_absolute_deviation) Edit 2018-06-23: move misplaced parentheses in |
@mwcraig For your sigma-clipping |
One other note: i.e. instead of this:
you could do this:
where If interested, I could put in a PR. |
One thing that I noticed during my tests is that we actually do most operations on the stacked arrays in I'm not sure if that matters much because we stack and reduce, however from the number of operations the last axis would be better in cases when we do clipping:
However processors are typically very clever what to pre-fetch so it might not be too bad that we reduce along the "worst" (memory-layout-wise) axis. Then I played a bit with possible optimizations, however it's really tricky to make optimizations because we have to "assume" things, for example it's easily possible with a numba function (or C extension) to speed up the clipping by a factor of 3-20 (depending on the shape of the stack), however that's assuming that the stack is 3D, that the clip-functions are either (np.ma.mean and np.ma.std) or (np.ma.median and astropy.stats.median_absolute_deviation). And do we really want to include numba (a heavy dependency if one doesn't have numba) or include code to compile (particular problematic for windows users that don't use anaconda) just for some particular combinations, even though they are probably the most common ones? |
I've a had chance to do a bit of profiling; the notebook I used to generate the profiles is in this gist. The notebook combines 25 images, each roughly 2k x 2k, combined by averaging, sigma clipped before combining. The upshot is that numpy masked median is really, really, slow. Two thirds of the time in the combination is spent in sigma clip; each of the three medians it does take roughly 7 seconds. Of that, a surprising amount of time is spent in numpy MaskedArray's A screenshot from SnakeViz is below. Following @larrybradley's approach of using the |
I don't know what the underlying performance issue is, but wow it is slow (clipping a stack of 25 4k x 4k images takes minutes).
The text was updated successfully, but these errors were encountered: