allow for significantly less memory usage in steps blending #435

mats-knmi · 2024-09-24T09:32:51Z

By delaying the decomposition of the model data into cascade levels and allowing for model data to be delivered in float32, it seems to be possible to run the blending with almost 10x less memory than before in some tests I did.

codecov · 2024-09-24T10:54:40Z

Codecov Report

Attention: Patch coverage is 97.82609% with 1 line in your changes missing coverage. Please review.

Project coverage is 83.89%. Comparing base (f211df2) to head (5c14399).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
pysteps/blending/steps.py	97.82%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #435   +/-   ##
=======================================
  Coverage   83.89%   83.89%           
=======================================
  Files         160      160           
  Lines       12900    12902    +2     
=======================================
+ Hits        10822    10824    +2     
  Misses       2078     2078

Flag	Coverage Δ
unit_tests	`83.89% <97.82%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mats-knmi · 2024-09-24T11:42:02Z

@sidekock I realized yesterday that, with these changes, we can run pysteps a lot cheaper at KNMI. I have tested these changes and it all still seems to run (and fast enough). Maybe you can take these changes along when you start on the refactoring.

RubenImhoff · 2024-09-24T14:28:46Z

Hi @mats-knmi, this looks great, many thanks for the great improvement! Do I understand it well that the code now prefers to work with the NWP rainfall data instead of the decomposed data (whereas this used to be the other way around)? With other words, do we still want to support decomposed NWP data as input? It would be a breaking change, so this needs some discussion, but it would make the code cleaner if it is not necessary anymore.

RubenImhoff · 2024-09-24T14:32:28Z

In addition, I think we should update the function arguments a bit (see below), where we then would switch around the array shape and description (favoring the four dimension without decomposition over the two dimensions with decomposition) and indicating that float32 is supported and even recommended for memory usage.

    precip_models: array-like
      Array of shape (n_models,timesteps+1) containing, per timestep (t=0 to
      lead time here) and per (NWP) model or model ensemble member, a
      dictionary with a list of cascades obtained by calling a method
      implemented in :py:mod:`pysteps.cascade.decomposition`. In case of one
      (deterministic) model as input, add an extra dimension to make sure
      precip_models is five dimensional prior to calling this function.
      It is also possible to supply the original (NWP) model forecasts containing
      rainfall fields as an array of shape (n_models,timestep+1,m,n), which will
      then be decomposed in this function. Note that for an operational application
      or for testing with multiple model runs, it is recommended to decompose
      the model forecasts outside beforehand, as this reduces calculation times.
      This is possible with :py:func:`pysteps.blending.utils.decompose_NWP`,
      :py:func:`pysteps.blending.utils.compute_store_nwp_motion`, and
      :py:func:`pysteps.blending.utils.load_NWP`.

mats-knmi · 2024-09-24T15:27:59Z

Yes indeed, personally I would prefer to work with rainfall data in stead of decomposed data. However this might not be the case as much if you have less NWP model members (like RMI has), since then the decomposed data is not as big as it is for the KNMI usecase and the small extra performance you get by pre-decomposing it might be worth it. So I would be in favor of leaving both options in there, but I will see if I can try to word this nicely in the docstring.

mats-knmi · 2024-09-24T15:45:01Z

@RubenImhoff I have updated the docstring, let me know if it is ok like this

RubenImhoff

Looks great so far! I've added a few comments; after that, I think we are good to go. :)

pysteps/blending/steps.py

Co-authored-by: Ruben Imhoff <[email protected]>

mats-knmi · 2024-09-30T08:38:09Z

@dnerini The unit tests github actions suddenly fail during the conda environment initialization, do you have any idea what is going on here? I didn't change anything that would impact this.

dnerini · 2024-09-30T19:31:58Z

@dnerini The unit tests github actions suddenly fail during the conda environment initialization, do you have any idea what is going on here? I didn't change anything that would impact this.

fixed it temporarily by pinning an older version of micromamba

pysteps/blending/steps.py

RubenImhoff · 2024-10-01T07:40:14Z

With the additions, we may have to add some tests for the coverage. But other than that, I think we're good to go. :) Nice work!

sidekock · 2024-10-03T08:33:24Z

@mats-knmi @RubenImhoff I am planning to look through everything this afternoon.

sidekock · 2024-10-03T11:11:21Z

@mats-knmi, I have some questions before I go more in-depth:

I assume both options (providing raw NWP and decomposed) are still fully supported?
Are the performance increases introduced now also applicable to the old method of decomposing first?
Has there been any analysis as to the impact on final nowcast of the 32bit vs 64bit data?
Is there a decrease in performance (time for a forecast)? I would expect this because now the computations are done during the run...

pysteps/blending/steps.py

mats-knmi · 2024-10-03T12:47:13Z

@sidekock

Yes both options should still work as intended. (The unit tests succeed for both options).
The performance increases are not related to speed, just reduction of memory usage. In general I expect the memory usage to be lower using both options, but specifically when running with non-decomposed model data, as that allows you to input a lot less data (7x less data with 7 cascade levels).
No I haven't done any analysis, but this should not impact a lot as all computations are still done with float64. It is just that you pass in the data as float32, meaning that it takes up less memory. Float32 has around 7 decimal digits of precision so it should be plenty accurate for storage.
Yes running the blending this way has a slight performance decrease (around 5%, or 10 seconds) is what we experienced.

All of this, for us, is more than worth it. We run the blending with 12 hours lead time 5 minute time step and 20 members. This used to mean that we had to read 100+ GB of decomposed model data into memory, before even starting the blending. This is now reduced to around 20 GB.

sidekock

@mats-knmi I did my best to go through all the changes. I added quite a few comments, most of them are just clarifications for me.

pysteps/blending/steps.py

sidekock · 2024-10-03T12:40:15Z

pysteps/blending/steps.py

+                            precip_models_cascade_temp
+                        )
+                        done = True
+                        break


Am I correct that you are looping over all members for each time until you find a member with rain? Are you just doing this change the infs with the minimal value in this field? Is this minimal value in the found field necessarily the minimal value of precipitation in any field?

This was the most difficult part to get right (see also the extensive discussion Ruben and I had on this in this Pull Request). We need a zero value and apparently np.nanmin(precip) or np.nanmin(precip_models) are not correct, since the decomposed precip values are different than the not decomposed values. Therefore we need to decompose the model. To prevent just decomposing the entire model here (which would defeat the purpose of not pre-decomposing the model), we need to find a timestep and member that is representative for the entire model, decompose that and use the nanmin of that for the zero value. Or at least that is the only way I could see that we could realistically do this. I don't know how the decomposition works exactly, but I would expect that an image that has precipitation has a representative decomposition of which we could take the nanmin. This is based on the assumption that at some point somewhere in the domain there is no rain, but this assumption is made quite a bit in other parts of the pysteps code as well.

I would like to add that this code is supposed to solve the issue when precip_cascade is completely filled with nan values, but it doesn't explicitly check if this is the case, which seems weird to me. I also don't know if it is possible to get a precip_cascade value that is completely filled with nans, if you give a sensible value in precip.

To me, this looks like an unfinished problem. I indeed see where this comes from, in an operational product, you should be ready for no radar data, but I feel like this is insufficient because, as far as I understand, this can give a lot of bias (a large precip field will give different min for the different cascades compared to a very intense convection event I would assume, please correct me if I am wrong here @RubenImhoff ). On the other hand, you do not expect the NWP to have huge variations in the 'weather type', so maybe it is fine. The only thing I worry about is that a very intense event with no other rain might give strange artefacts. I assume you used the same code as in "_fill_nans_infs_nwp_cascade" before?
I feel like we at least need to explore a full timestep before taking a min but that might be to computationally intensive (although we could also paralalize this)

No, this is actually the only part of the code that has some possible functional change. Since before at this point always the entire model field would have been decomposed, you could just take the nanmin over the entire decomposed forecast. But now this is not possible. Computationally it would indeed be less than ideal to decompose the entire forecast at this point in the code since the new changes were supposed to prevent this.

However, if this code is supposed to ONLY handle the case where the input (decomposed) precip field is completely filled with nans, I believe we should change the if statement here to check for that in stead of just checking for zero_precip_radar. If we do that, then I believe this case will not show up often (or at all) operationally, so it is not a big deal if it does take some extra time.

Also, this part of the code was never in _fill_nans_infs_nwp_cascade before. This was an additional check that was done separately to that method. this is the part of the code this part is trying to replicate.

I think it is good for now but maybe it is a good thing to raise this as a separate issue?
For now I see three possible solutions:

Leave as is, risking some possible issues due to early cutoff

Processing the whole NWP every run (very ram intensive and reduces computational efficiency

Pre-process NWP to get these no rain values once, and feed them into the model (can be used for multiple runs, depending on the frequency of the RUC).

I don't think we should fix this issue here but we should keep this noted somewhere that this can be optimised ...

Yes good point, do you want to open an issue or should I?

Maybe we merge first and then create the issue on the master branch? Ill let you do it :)

I cannot merge, but I can create an issue. Can you merge or should @dnerini do this?

I can not merge either

I can merge it

sidekock · 2024-10-03T12:42:42Z

pysteps/blending/steps.py

@@ -809,23 +825,80 @@ def forecast(
            if measure_time:
                starttime = time.time()

+            if precip_models_cascade is not None:


I dont think precip_models_cascade is ever initialized, al I wrong in this?

On line 576 it is set to None and then on line 579 it is set to the input precip_models if the ndim is not equal to 4.

So that means if dims is not 4 (the already decomposed case), it has been defined. If it is the other case (raw NWP), it is non, and thus, this decomposed. Is this correct?

Yes that is correct.

pysteps/blending/steps.py

sidekock · 2024-10-03T12:57:38Z

@sidekock

* Yes both options should still work as intended. (The unit tests succeed for both options).

* The performance increases are not related to speed, just reduction of memory usage. In general I expect the memory usage to be lower using both options, but specifically when running with non-decomposed model data, as that allows you to input a lot less data (7x less data with 7 cascade levels).

* No I haven't done any analysis, but this should not impact a lot as all computations are still done with float64. It is just that you pass in the data as float32, meaning that it takes up less memory. Float32 has around 7 decimal digits of precision so it should be plenty accurate for storage.

* Yes running the blending this way has a slight performance decrease (around 5%, or 10 seconds) is what we experienced.

All of this, for us, is more than worth it. We run the blending with 12 hours lead time 5 minute time step and 20 members. This used to mean that we had to read 100+ GB of decomposed model data into memory, before even starting the blending. This is now reduced to around 20 GB.

So you are doing this decomposition on the code now per time step and this is only impacting 10s for an entire forecast (for how long is this nowcast run? eg how many timesteps?) or 10s per step?

mats-knmi · 2024-10-07T06:15:52Z

@sidekock 10 seconds longer over the entire forecast of 12 hours, with a 5 min timestep, so 144 timesteps. I believe this part might actually take like 20 seconds total, but the original _compute_cascade_decomposition_nwp, would also take 10 seconds in the old code, so bottom line is we lose around 10 seconds total.

sidekock · 2024-10-07T07:42:11Z

@sidekock 10 seconds longer over the entire forecast of 12 hours, with a 5 min timestep, so 144 timesteps. I believe this part might actually take like 20 seconds total, but the original _compute_cascade_decomposition_nwp, would also take 10 seconds in the old code, so bottom line is we lose around 10 seconds total.

Great, thanks! Good to have a benchmark, always easier to convince people that way :)
Thanks for answering all my questions, ill put my stamp on approval for this merge :) Then I can begin with the refactoring whenever it ends up in the master.

allow for significantly less memory usage in steps blending

7b52ab5

mats-knmi requested a review from RubenImhoff September 24, 2024 09:32

mats-knmi added 4 commits September 24, 2024 11:38

apply black

34d90ba

use threadpool

dabd93a

only decompose if not already done

4e2758a

fix black

1a4ab4b

mats-knmi requested a review from sidekock September 24, 2024 15:29

mats-knmi added 2 commits September 24, 2024 17:40

update docstring

a468411

fix one more docstring and rename some variables

a5c9b22

RubenImhoff requested changes Sep 26, 2024

View reviewed changes

mats-knmi and others added 3 commits September 26, 2024 16:54

Apply suggestions from code review

6dc7efe

Co-authored-by: Ruben Imhoff <[email protected]>

rename precip_models_pm

af0cd15

move comment about memory reduction to velocity_models

17c00c7

Merge branch 'master' into memory-efficiency-blending

03bcfc6

mats-knmi added 2 commits October 1, 2024 08:10

fix zerovalue precip_cascade

3d1366b

use precip_models_cascade if present

c5718cb

RubenImhoff reviewed Oct 1, 2024

View reviewed changes

pysteps/blending/steps.py Show resolved Hide resolved

mats-knmi added 2 commits October 1, 2024 09:33

set done to true

f5e5697

fix black

e478bc0

mats-knmi added 2 commits October 2, 2024 17:29

revert to old logic, but now with precip_models_cascade using new loop

7fd7122

fix issue with precip_models_cascade

6d51fd4

fix tests

0198a54

RubenImhoff approved these changes Oct 3, 2024

View reviewed changes

sidekock reviewed Oct 3, 2024

View reviewed changes

pysteps/blending/steps.py Show resolved Hide resolved

sidekock reviewed Oct 3, 2024

View reviewed changes

Move comment up a little

5c14399

sidekock approved these changes Oct 7, 2024

View reviewed changes

dnerini merged commit 8521a53 into master Oct 7, 2024
9 checks passed

dnerini deleted the memory-efficiency-blending branch October 7, 2024 17:24

mats-knmi mentioned this pull request Oct 29, 2024

Make sure correct zerovalue is used for precip_models_cascade in blending code #438

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow for significantly less memory usage in steps blending #435

allow for significantly less memory usage in steps blending #435

mats-knmi commented Sep 24, 2024 •

edited

Loading

codecov bot commented Sep 24, 2024 •

edited

Loading

mats-knmi commented Sep 24, 2024

RubenImhoff commented Sep 24, 2024

RubenImhoff commented Sep 24, 2024

mats-knmi commented Sep 24, 2024

mats-knmi commented Sep 24, 2024

RubenImhoff left a comment

mats-knmi commented Sep 30, 2024

dnerini commented Sep 30, 2024

RubenImhoff commented Oct 1, 2024

sidekock commented Oct 3, 2024

sidekock commented Oct 3, 2024 •

edited

Loading

mats-knmi commented Oct 3, 2024

sidekock left a comment

sidekock Oct 3, 2024

mats-knmi Oct 3, 2024

sidekock Oct 3, 2024

mats-knmi Oct 7, 2024 •

edited

Loading

sidekock Oct 7, 2024 •

edited

Loading

mats-knmi Oct 7, 2024

sidekock Oct 7, 2024

mats-knmi Oct 7, 2024

sidekock Oct 7, 2024

dnerini Oct 7, 2024

sidekock Oct 3, 2024

mats-knmi Oct 3, 2024

sidekock Oct 3, 2024

mats-knmi Oct 7, 2024

sidekock commented Oct 3, 2024

mats-knmi commented Oct 7, 2024

sidekock commented Oct 7, 2024

allow for significantly less memory usage in steps blending #435

allow for significantly less memory usage in steps blending #435

Conversation

mats-knmi commented Sep 24, 2024 • edited Loading

codecov bot commented Sep 24, 2024 • edited Loading

Codecov Report

mats-knmi commented Sep 24, 2024

RubenImhoff commented Sep 24, 2024

RubenImhoff commented Sep 24, 2024

mats-knmi commented Sep 24, 2024

mats-knmi commented Sep 24, 2024

RubenImhoff left a comment

Choose a reason for hiding this comment

mats-knmi commented Sep 30, 2024

dnerini commented Sep 30, 2024

RubenImhoff commented Oct 1, 2024

sidekock commented Oct 3, 2024

sidekock commented Oct 3, 2024 • edited Loading

mats-knmi commented Oct 3, 2024

sidekock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mats-knmi Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

sidekock Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sidekock commented Oct 3, 2024

mats-knmi commented Oct 7, 2024

sidekock commented Oct 7, 2024

mats-knmi commented Sep 24, 2024 •

edited

Loading

codecov bot commented Sep 24, 2024 •

edited

Loading

sidekock commented Oct 3, 2024 •

edited

Loading

mats-knmi Oct 7, 2024 •

edited

Loading

sidekock Oct 7, 2024 •

edited

Loading