Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the blending/steps.py nowcasting code #395

Open
ladc opened this issue Jul 19, 2024 · 20 comments · May be fixed by #436
Open

Refactor the blending/steps.py nowcasting code #395

ladc opened this issue Jul 19, 2024 · 20 comments · May be fixed by #436
Assignees

Comments

@ladc
Copy link
Contributor

ladc commented Jul 19, 2024

Start with low-hanging fruit:

  • clearer variable names (compare to nowcasts/steps.py refactoring)
  • potentially make the monolithic function more modular.
@ladc ladc changed the title Refactor the steps.py nowcasting code Refactor the blending/steps.py nowcasting code Jul 19, 2024
@dnerini
Copy link
Member

dnerini commented Jul 19, 2024

Yes please!

@sidekock
Copy link
Contributor

@RubenImhoff I went trough the whole code to see what was unclear for me. I will already paste my findings here and maybe we can discuss later today. Ill first time some time to eat :p

Specific lines of needed name changes

Line 583: R_f should get a new name

Line 600: R_f_ empty_precip?

Line 616: outarr

Line 677: pp, generate_noise, noise_std_coeffs = _init_noise(

Line 716: randgen_prec, vps, generate_vel_noise = _init_random_generators(

Line 727: D, D_Yn, D_pb, R_f, R_m, mask_rim, struct, fft_objs = _prepare_forecast_loop(

Line 784: forecast_prev = precip_cascade
noise_prev = noise_cascade
t_prev = [0.0 for j in range(n_ens_members)]
t_total = [0.0 for j in range(n_ens_members)]

Line 790: This timestep function is extremly complicated to understand, maybe changing some names would make this more clear

Line 858 rho_extr would change to rho_extrap (just a bit more clear what this is)

Line 865 R_f_ should be renamed: R_f_temp

Line 922: cov -> covariance_radar_model?

Line 955: EPS should be changed, quite unclear what it is

Line 997: EPS_ should be changed

Line 1028: R_f_ip

Line 1033: Yn_ip

Line 1051: t_diff_prev

Line 1065: V_stack

Line 1073: V_model_

Line 1094: R_f_im_recomp

Line 1111: R_f_ep_recomp_

Line 1118: R_f_ep_recomp

Line 1119: temp_mask

Line 1122: R_f_ep

Line 1144: Yn_ip_recomp

Line 1158: Yn_ep_recomp

Line 1159: Yn_ep

Line 1173: R_f_ep_out, Yn_ep_out, R_pm_ep

Line 1285: R_f_out

Line 1339: cascades_stacked_

Line 1356: R_f_blended

Line 1361: R_f_blended_mod_only

Line 1384: R_f_new

Line 1391: R_f_new_mod_only

Line 1412: weights_pm, weights_pm_normalized, weights_pm_mod_only,

Line 1438 Better commenting?

Line 1439: R_pm_blended

Line 1447: R_pm_blended_mod_only

Line 1490: Can be removed

line 1517: R_cmin

Line 1525: MASK_prec -> MASK_precip or MASK_precip_cell?

Line 1550: mu_0, MASK, mu_fct

Line 1578: R_f_stacked

General:
All fc should be changed to forecast?

We should look at all these t_... variables, they are quite strange

Do we go for mu and sigma or means and sigmas? Should be uniform choice

R_f = precip_forecast? is this correct?

@sidekock
Copy link
Contributor

For all of these, I could not find the names in the nowcast steps but it is really hard to see the variables through the forest :) . I also tried to do this matching by using some generative AI tools but it does not preform well on this sort of tasks

@RubenImhoff
Copy link
Contributor

To give a brief answer (already):

General:
All fc should be changed to forecast? --> Yes!

We should look at all these t_... variables, they are quite strange --> Fair, if doable :)

Do we go for mu and sigma or means and sigmas? Should be uniform choice --> I would go for means and sigmas then :)

R_f = precip_forecast? is this correct? --> Correct

@dnerini
Copy link
Member

dnerini commented Jul 19, 2024

short comment: when there are so many variables given and returned by functions, it's usually a good indication that you need a class.

so why not refactoring the code into a class and so that the methods can simply access attributes instead of having to pass them around all the time? the main interface can stay the same, that is, forecast()

@sidekock
Copy link
Contributor

I agree this is the most elegant solution but maybe we should tackle that problem after name changes and changing the monolith into something more readable?

@sidekock
Copy link
Contributor

My work today so the refactoring can be done in the future; things like splitting up the code into more modular functions or making a class can be added after the current steps I list below:

STEP 1:
All "R_f" should be changed to "precip_forecast". Use CTRL+F to find all relevant changes in both code and comments
STEP 2:
Specific names need to be changed; due to step 1, some of them have been changed from the original. Below I list all changed needed and some proposed names (can definitely be changed). The check with CTRL F is added because refactoring in one place does not always guarantee the name is changed in every function.

Line number Old name Name after step 1 Proposed name Check with Ctrl F
600 R_f_ precip_forecast_ precip_forecast No
616 outarr precip_forecast_all_members_all_times No
677 pp generate_perturb Yes
716 randgen_prec randgen_precip Yes
716 vps velocity_perturbations Yes
727, 2321 R_m precip_forecast_non_perturbed
727, 2303 D previous_displacement No
727, 2304 D_Yn previous_displacement_noise_cascade No
727, 2305 D_pb previous_displacement_prob_matching No
761, 859 rho_extr_prev rho_extrap_cascade_prev No
762, 882, 892 rho_extr rho_extrap_cascade No
784 forecast_prev precip_forc_prev_subtimestep No
785 noise_prev noise_prev_subtimestep No
786 t_prev t_prev_timestep No
787 t_total t_leadtime_since_start_forecast No
922 cov covariance_nwp_models No
955, 2387 EPS epsilon_decomposed No
955, 964, 2387, 2391 epsilon_decomposed epsilon_ No
977, 2403 EPS_ epsilon_temp No
1026 t_diff_prev_int t_diff_prev_subtimestep_int No
1028 R_f_ip precip_forecast_ip precip_forecast_cascade_subtimestep No
1033 Yn_ip noise_cascade_subtimestep No
1051 t_diff_prev t_diff_prev_subtimestep No
1059 velocity_pert velocity_perturbations_extrapolation No
1065 V_stack velocity_stack_all No
1073 V_model_ velocity_models No
1094 R_f_ip_recomp precip_forecast_ip_recomp precip_forecast_recomp_subtimestep No
1111 R_f_ep_recomp_ precip_forecast_ep_recomp_ precip_forecast_extrapolated_recomp_subtimestep_temp No
1118 R_f_ep_recomp precip_forecast_ep_recomp precip_forecast_extrapolated_recomp_subtimestep No
1119 temp_mask ???
1122 R_f_ep precip_forecast_ep precip_forecast_extrapolated_decomp No
1144 Yn_ip_recomp noise_cascade_recomp No
1151 Yn_ep_recomp_ noise_extrapolated_recomp_temp No
1158 Yn_ep_recomp noise_extrapolated_recomp No
1159 Yn_ep noise_extrapolated_decomp No
1173 R_f_ep_out precip_forecast_ep_out precip_forecast_extrapolated_decomp_done No
1147 Yn_ep_out noise_extrapolated_decomp_done No
1192 R_ precip_forecast No
1194 R_pm_ep precip_forecast_extrapolated_probability_matching_temp No
1201 R_pm_ep_ precip_forecast_extrapolated_probability_matching No
1285 R_f_out precip_forecast_out final_blended_forecast No
1294 cascades_stacked cascade_stack_all_components No
1339 cascades_stacked_ cascade_stack_all_components_temp No
1347 covariance_nwp_models covariance_all_components No
1356 R_f_blended precip_forecast_blended precip_forecast_blended No
1361 R_f_blended_mod_only precip_forecast_blended_mod_only precip_forecast_blended_mod_only No
1384 R_f_new precip_forecast_new precip_forecast_recomposed No
1391 R_f_new_mod_only precip_forecast_new_mod_only precip_forecast_recomposed_mod_only No
1412 weights_pm weights_probability_matching No
1413 weights_pm_normalized weights_probability_matching_normalized No
1415 weights_pm_mod_only weights_probability_matching_mod_only No
1438 weights_pm_normalized_mod_only weights_probability_matching_normalized_mod_only No
1439 R_pm_blended precip_forecast_probability_matching_blended No
1447 R_pm_blended_mod_only precip_forecast_probability_matching_blended_mod_only No
1517 R_cmin precip_forecast_min_value No
1525 MASK_prec precip_field_mask No
1550 Mu_0 mean_probabiltity_matching_forecast No
1551 MASK no_rain_mask No
1552 mu_fct mean_precip_forecast No
1578 R_f_stacked precip_forecast_stacked precip_forecast_final No
1955 R_min precip_forecast_min No
1998 R_d precip_forecast_decomp No
2000 R_ precip_forecast No
2014 R_c precip_forecast_cascades No
2098 R_c precip_forecast_cascades No
2165 R_models_pm precip_forecast_probability_matching No
2291 R_c precip_forecast_cascades No
2321 R_m ???? ???
2333 R_c precip_forecast_cascades No

STEP 3:
Splitting up the code into separate functions (I might add some suggestions here tomorrow)

STEP 4:
Changing to a class will make keeping track of all these variables less necessary.

Please provide me with some feedback.

@sidekock
Copy link
Contributor

sidekock commented Jul 20, 2024

I just went trough the code to try and refactor it. A version This is just a proof of concept is now pushed to try and show my suggestions (however with the old names as I did not push my version with name changes yesterday...)
Below I will post the suggestions I applied to the code to have a clear visual what to do once the master branch can be refactored.

@sidekock
Copy link
Contributor

Refactoring Plan

Part in Code Suggested Name Extra Info/Recommendations
Everything from start till just before #1 preprocessing Check if all variables are still found in function: zerovalue
Everything from #2 to just before #2.3 compute_cascades Check if all variables are still found for the function: mu_extrapolation, sigma_extrapolation
After 2.3.1 in if zero_precip_radar and zero_model_fields: no_rain_radar_and_nwp
Bring #3 initialisze_noise_methods
Bring #6 into one function except for last line initiate_all_random_generators Check if all variables are still found: randgen_prec, vps, generate_vel_noise, mu_noise, sigma_noise, D, D_Yn, D_pb, R_f, R_m, mask_rim, struct, fft_objs
Everything in worker should be first extracted to a worker function - Not a nested definition but a separate function
Then worker should be split up further -
Bring 8.1.2 determine_skill_nwp_at_leadtime
Bring 8.2 calculate_weights_per_component
Bring 8.3.1 calculcate_epsilon_decomposed
8.3.2 iterate_ar_extrapolation_per_cascade
8.3.3 iterate_ar_noise_per_cascade
For loop in 8.4 extrapolate_per_timestep
Following for loop just before 8.5 advect_forecast_one_timestep_no_subtimesteps
In 8.5 if blend_nwp_members: else ... concatinate_cascade_mean_and_sigmas
if weights_method == "spn" weights_method_spn
Before 8.6 blend_all_cadcades
8.6 recompose_cascades_to_precip_field
8.7.1 calculate_probability_matching_member
8.7.1 calculate_forecast_probabilistic_member
8.7.2 apply_probability_matching

@sidekock
Copy link
Contributor

@RubenImhoff @dnerini would now be a good time to refactor the code in steps blending and nowcasting? I have talked to @mats-knmi about how this would be integrated in the x-array version. This would be done after the changes are done in the current versions and once all tests are passed.

@dnerini
Copy link
Member

dnerini commented Sep 18, 2024

sounds very good!

@sidekock
Copy link
Contributor

@dnerini just to confirm if this would be okay. I was planning to refactor this code to a class due to all the variables. Does this conform to the pysteps design philosophy or should I just keep all the variables as local variables and remove the class based approach?

@dnerini
Copy link
Member

dnerini commented Sep 20, 2024

@sidekock Go ahead with the refactoring. Yes, true, at the beginning we explicitly decided to avoid oop, but i believe the limits and shortcomings of that decision are quite clear.. so let's give it a go!

@RubenImhoff
Copy link
Contributor

@sidekock, sorry I was on vacation, so therefore my slow response. Great if you want to make a start with this! :)

@mats-knmi
Copy link
Contributor

@sidekock, since I am planning on starting with the blending conversion to xarray as well soon, I would appreciate it if you could create a draft pull request from the branch you are currently working on and add me as (one of) the reviewers. This way I can easily keep track of your changes and take this into account when working on the xarray stuff.

@sidekock
Copy link
Contributor

@mats-knmi Will do! I will probably only start somewhere later this week. The teaching season is once again upon me so first need to make sure I am ready on that front :)

@sidekock
Copy link
Contributor

sidekock commented Oct 8, 2024

@mats-knmi I will start this afternoon with the re-factoring. Will start with the nowcast steps to have a template and then look at the blending steps. Ill add you as a reviewer as soon as the nowcast steps is refactored @RubenImhoff can I add you too?

@sidekock
Copy link
Contributor

sidekock commented Oct 8, 2024

So I started today but I have a general question regarding the _check_inputs function. This is used to check some of the inputs put others are then checked in the forecast loop itself. Would it not be best to add all these checks and preparatory assignments (eg None -> dict()) in this _check_inputs?

@RubenImhoff
Copy link
Contributor

@mats-knmi I will start this afternoon with the re-factoring. Will start with the nowcast steps to have a template and then look at the blending steps. Ill add you as a reviewer as soon as the nowcast steps is refactored @RubenImhoff can I add you too?

Of course! :)

@RubenImhoff
Copy link
Contributor

So I started today but I have a general question regarding the _check_inputs function. This is used to check some of the inputs put others are then checked in the forecast loop itself. Would it not be best to add all these checks and preparatory assignments (eg None -> dict()) in this _check_inputs?

That may be a better way indeed.

@sidekock sidekock linked a pull request Oct 9, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

5 participants