Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Fixes, Enhancements and Improvements for CatFIM #1182

Open
EmilyDeardorff opened this issue May 28, 2024 · 5 comments
Open

[EPIC] Fixes, Enhancements and Improvements for CatFIM #1182

EmilyDeardorff opened this issue May 28, 2024 · 5 comments
Assignees
Labels
CatFIM NWS Flood Categorical HAND FIM enhancement New feature or request Epic Agile epic. Subtasks are itemized and listed with check boxes, and individual cards/stories are made

Comments

@EmilyDeardorff
Copy link
Contributor

EmilyDeardorff commented May 28, 2024

This is an EPIC card. As items are taken off this list, they will get moved into their own cards, referenced back to this as a total list of all known items, mostly are brand new so prioritization is required

Currently, the CatFIM process takes a very long time to run (~25 hrs for a full BED run) and lacks efficiency both for full runs and for partial/testing runs. Update the CatFIM scripts to incorporate more efficient multiprocessing and segmented run capabilities (i.e. running only on a list of HUCs and/or a flag that allows CatFIM to re-run without downloading all of the metadata again).

These are above and beyond finishing the AK integration

Obviously, we won't get all in, but this is a list of all possible fixes. We will extract out some for the next CatFIM release when the AK integration is done. However, some in this list are very important.

Potential CatFIM Fixes and Upgrades:

  1. (Lower priority, should be fixed by AWS shift) Improve efficiency for calling the WRDS API. In our "get_thresholds" in generate_categorical_fim_flows.py, it calls WRDS to get some nws_lid but only passes in lid data and not huc data. BUT... our process_generate_flow function, it calls that get_threshhold for every huc and every ahps site. This is a massive amount of duplication. We can call WRDS for each ahps or even all ahps via the specific WRDS api call, save it to our disk and use it. We can likely even load it in certain parts of the code and pass that dataframe around to other functions including into a Multi-Proc. This would be a huge performance gain.

  2. (Lower priority, should be fixed by AWS shift) Facilitate a pre-download option for the threshold files (similar to what we've done with the metadata files). We now have the ability to save metadata file from WRDS in generate_categorical_fim_flows.py but can we make a similar system for the threshold files so we can run catfim tools in a non OWP server enviro? Assuming that the fix has not yet been made to let AWS all the WRDS api. Can we just get a new DB or something we can put on a file server instead of making 5,000+ Web API calls which is a huge amount of overhead?

  3. Review and maybe upgrade the HUC_messages system. The system creates a file per HUC / lid which says if if the lid was loaded or if it failed and why. It later reloads all of the files in the "huc_messages" folder, maps them to each record for the output csv with some saying why they failed. There are huge opportunities for improvement here, but it does work so it may not be worth the effort to fix. The current setup is "multi-proc" safe. Attached is one example of one ahps site that was not fully able to load catfim and the message why. The image is in two parts, limitations of screen caps.

Examples of how huc_messages turn into status and end up in the UI.
Catfim_Stage_example_failed_load_in_UI_part_1
Catfim_Stage_example_failed_load_in_UI_part_2

  1. (1286) Add in an argument to remove intermediate extent files.*** In the mapping/{huc}/{lid} are a bump of inundated images changed extent files. It is done to each branch, then merged to a single "extent" file for that {huc}/{lid}/{lid}_{magnitude}_extent.tif. We obviously need to keep the final extent files per magnitude and not all of the branch intermediary files. Note: that system is now in but the code commented for version or two for tracing reasons. Maybe uncomment in the next HV 2.1.8 release (in a few months) to clean up all of those intermediate tif files as there is A LOT. Update: We have commented out code in there to delete them, just make it driven by an arg to delete or not (default delete). Can helps with debugging during dev. Add an argument in for deleting or not, default to delete. Nice to have when code/debugging and most of the code is there, easy to finish.

    • Update: Did not make this 2.1 version. Held back for debugging and tracing purposes. (aka.. the delete cleanup of the tif files.
  2. Fix duplication of processing paths for stage and flow based as it calls functions in the code.

  3. (Should be fixed by AWS shift) Adjust and review the multi-proc job number arguments. We have three job numbers, but we have some parts that are MP, inside MP, inside MP. ie process_generate_categorical_fim -> (MP) iterate_through_huc_stage_based -> (MP) produce_stage_based_catfim_tifs -> (MP) produce_inundation_map_with_stage_and_feature_ids. This may not be the most efficient use of jobs or even MP but needs to be reviewed. There are others such as post_process_cat_fim_for_viz -> (MP) post_process_huc_level -> (MP) reformat_inundation_maps

  4. Test and re-eval the "past_major_interval_cap" system. (argument of -mc / --past_major_interval_caps). Does it still work? Do we want to keep it and maintain that code? -> Yes! (as of 9/11/24) 1283

  5. 1283 Test and eval if we want the search system (argument of -s / --search)). Does it still work? Do we want to keep it and maintain that code? -> Yes!

  6. 1287 Test and fix OR remove the lid choice system (arguments of -l / --lid_to_run). It has been broken since winter 2024. Could be handy but might not be worth the effort to fix it. -> 09/11/24 - Decided to remove this feature in lieu of finishing adding the new list of HUCs system, # 10 below.

  7. (Priority) (no card yet) Fix Emily's list of HUCs to run argument (argument of -lh / --lst_hucs). This is very handy, new to CatFIM 2.0 but we temp disabled it, so it should be quick to re-enable. Lower priority for v2.1, but nice to get in there. Prioritized over [8pt] Setup regression testing git hook for feature branch #9 as of 9/11/24

    • For Emily later. Issue 1311
    • update. Now fixed in new dev-catfim-v2-1 branch. -
  8. Finish the new "step" system, which allows for us to skip flows if all of the files are already in place to go straight to inundation. Maybe consider adding more "step" points?

    • update. Now added in new dev-catfim-v2-1 branch.
  9. Review test / fix using the generate_categorical_fim_mapping.py and generate_catfim_flow.py scripts to see if they work from command line. I rebuilt the generate_categorical_fim_mapping.py part way through the rebuild but did not come back to finish it. The generate_catfim_flow.py script is definitely broken. More of a rewrite task, lower priority as of 9/11/24.

  10. (Priority) (no card needed, ongoing) Clean up doc string notes below most functions. Most are very, very out of date. For now, either finish a doc string section to be correct or remove if not. Important! But doesn't need to be its own issue. Good to be fixing as we go. More to be done but we can call this done for CatFIM 2.1.

  11. (Priority) 1272 Review and cleanup the lid sites rule such as special cases for lids based on crs, datum_adjust, etc. It is now it's own issue Big deal! Out-of-date hard coding could be causing a lot of the issues in the subsequent bullet points. A note from Carson: Would be good to add "and" statements to make sure that the known issue is true before applying a hard-coded fix. Pseudocode example: if LID == fhjs5 AND datum == NULL, set datum = ‘NAD83’

    • Done in CatFIM 2.1

Note from Derek:
It would be good to add the AHPS restricted sites list to the catfim folder of our online inundation-mapping repo so that the field can check and provide feedback on whether certain sites can be removed from the list. Would be good to sort the LIDs by RFC and then alphabetical order.
new service / feature someday? xls list? other?

  • Done. It now exists in the repo at /tools/catfim/stage_based_ahps_restricted_sites.csv. For future releases, it can be updated by others and updated in the repo.
  1. (Priority) 1273.
    Review Errors from the 4.5.2.11 flow based and stage based full runs.

    • Update: Oct 21, 2024: Done. Ran a set of all stage based dropped sites with newest code and reviewed it. A good handful 40-50% ?? are now fixed. They were dropped due to the bug where one missing stage failed the record. Now fixed.
  2. (Priority) 1274 Review / Fix logic for Stage Based Sea Level sites and applicable ahps site list. Also covers moving the HV applicable ahps list over to CatFIM code .

    • Fixed and in 2.1 branch. Updated to a csv in code that has the list of stage_based_aphs_restricted_sites.csv that the code can use to delimited sites and use it's reason back to our outputs. This also allows RFC's to review and update it for future releases easily.
  3. (Priority) (no card but done) Ensure that the stage_based_catfim_sites.csv file has the 'nws_lid' column renamed to 'ahps_lid'.

    • update. Now fixed in new dev-catfim-v2-1 branch.
  4. (Priority) 1275 ***FIM 4.5.2.11 - CatFIM stage site issues - jrsu1 and okfi2 (and other misc sites) ***

    • For Emily later.
  5. (Added Sep 4, 2024) 1277 CatFIM - issues with new docker image.
    - update. Now fixed in new dev-catfim-v2-1 branch.

  6. (Added Sep 4, 2024) Review sites that were dropped since 4.4.0.0 to 4.5.2.11. They might be related # 1274, or 1277above but might not. Derek wants us to compile and review all. Rob will put a card in for it. Related: HydroVIS has to parts of there system which can drop ahps sites as well, so we will reconcile that with the HV team at the same time. In HV, they compare our sites coming in against a DB of their own named something like "ahps_restricted_sites" and drops more. We need to reconcile that as part of this task, plus 1274 and 1275.

    • update: One of the five places we drop (or change mapped = no) for sites is the HV test against the "aphs_restricted_sites". Derek said it is fine for now but will double check later to see how up to date it is. While we won't worry about the status of the data in the aphs restricted sites db, we will pull that data and logic over to FIM catFIM processing code. 1288. All part of site reconciliation.
    • Update: Sep 23: For the "sites" list, it was discovered that if the t does appear that the sites that don't have all 4 thresholds were dropped in this new version. This is likely incorrect. But watch for 4.4.0.0 records where their might be missing threshold stages and all categories failed to map. We will need to confirm rules of when a record shoudl be dropped in relation to partial threshholds. This logic appears to have not changed sine 4.4.0.0, but may be wrong in the first place.
    • Update: Sep 23: Some HUC8s appear to have changed on some records. Research required.
    • Update: Sep 23: review actual status msg's. ie)
      • 4.4.0.0 was: "large discrepancy in elevation estimates from gage and HAND and all categories failed to map". 4.5.2.11 now shows "large discrepancy in elevation estimates from gage and HAND"
      • 4.4.0.0 was "no stage values available and all categories failed to map", now "no stage values available"
      • 4.5.2.11: There are a bunch saying status of "undermined". This was a backup system and any record showing that value, means their is a code logic error.
      • 4.4.0.0 was: "likely unacceptable gage datum error or accuracy code(s); please see acceptance criteria and all categories failed to map". 4.5.2.11 is "usgs_elev_table missing, likely unacceptable gage datum error -- more details to come in future release"
    • Update Sep 23: How can there be any miss matches on AHPS id's. both 4.4.0.0 and 4.5.2.1, have some that do not matching ahps_id record.
    • Update Sep 28: Fixed in dev-catfim-v2-1, other than some outlier issues mentioned above.
  7. (Added Sep 6, 2024): For the all four .csv's that we create, drop the geometry column and create four more files with the name "_metadata" in them. This will make for smaller versions for purely analyzing non geo columns, like status, is mapped flag, and other values in it. It is purely a debugging tool but only takes 2 line of code to be added and would help quite a bit. Might no longer be needed depending on split library outputs (see # 25 below)

  • update Oct 31: Skipped for now.
  1. (Added 9/11/24): 1282 Make a CatFIM folder to store the CatFIM scripts!

    • update. Now fixed in new dev-catfim-v2-1 branch.
  2. (Added 9/9/24): 1279 Hawaii points disappeared from stage-based CatFIM in release 4.5.2.11 . Keep an eye on other tasks relating to lid reconciliation as they might/might not be related.

    • Rob to check it, might be fixed in the status fixes from num 20 above.
    • Update Oct 18: Some what put on hold. There is some weirdness in the FIM code creating the usgs_elev_tables.csv for HUCS. A separate card is created for that problem. However.. we now estimate that just two Hawaii hucs might make it through which is KMSH1 and WRSH1. Testing in progress. Update: Those failed for other reasons. None remained from 20060000. Will try other Hawaii hucs later.
    • Update: Oct 21: This is not a FIM bug, but a WRDS issue and is impacting the usgs_guages.gpkg that FIM uses to create the usgs_elev_tables. Topic resolved.
  3. (Added 09/23/24): Ensure we compare mapped = Yes from out site list, and ensure there is at least one rec in library file.

    • Done
  4. (Added 10/07/24): Priority (card coming) Split library csv output files to new folder with each HUC having it's own sites. ie) split to HUC level instead of one giant CONUS+ library csv.

    • Update. 10/28/24: (1331}[https://github.com/[8pt] Split CatFIM output library file to group by HUC2's #1331]: Proved we must split the outputs library files. Stage based will be a fair bit bigger than previous CatFIM runs, for various reasons. Merge all library files to be HUC2 grouped.
    • Update 10/31/24: Found spliting creates even more problems especially that the catfim compare tool won't work. That tool can compare one set of inputs from previoius versions to this version.
@EmilyDeardorff EmilyDeardorff added enhancement New feature or request CatFIM NWS Flood Categorical HAND FIM labels May 28, 2024
@RobHanna-NOAA RobHanna-NOAA changed the title [13pt] Improve CatFIM performance [21pt] Improve CatFIM performance and fixes Jun 20, 2024
@RobHanna-NOAA RobHanna-NOAA self-assigned this Aug 29, 2024
@RobHanna-NOAA
Copy link
Contributor

Some of these tidbits were fixed in PR 1165 which is the rebuild of CatFIM. However, this Issue will remain open as there are more things to cover, each a different importance levels.

@RobHanna-NOAA RobHanna-NOAA added the Epic Agile epic. Subtasks are itemized and listed with check boxes, and individual cards/stories are made label Sep 2, 2024
@RobHanna-NOAA RobHanna-NOAA changed the title [21pt] Improve CatFIM performance and fixes [EPIC] Improve CatFIM performance and fixes Sep 2, 2024
@RobHanna-NOAA RobHanna-NOAA changed the title [EPIC] Improve CatFIM performance and fixes [EPIC] Fixes, Enhancements and Improvements for CatFIM Sep 4, 2024
@EmilyDeardorff
Copy link
Contributor Author

Notes from 9/11/24 meeting

1 & 2: These issues will probably go away when we move CatFIM to AWS.

4: Should be quick.

5: A simple reorganization task.

6: A larger reorganization task, but will probably not be needed when we move CatFIM to AWS.

7&8: Need too make an issue, we have decided to maintain these functions. Would be good to clarify their function in docs and add comments in the code.

9&10: Choose one because they’re very similar. I think we will choose #10 because it’s closer to finished.

11: Do it!

12: More of a rewrite task, lower priority.

13: Important! But doesn’t need to be its own issue. Rather, this documentation is something that we can and should be fixing as we go.

14: Big! Out-of-date hard coding could be causing a lot of the issues noted in the subsequent bullet points. A note from Carson: Would be good to add “and” statements to make sure that the known issue is true before applying a hard-coded fix. Pseudocode example: if LID == fhjs5 AND datum == NULL, set datum = ‘NAD83’

It would be good to add the AHPS restricted sites list to the catfim folder of our online inundation-mapping repo so that the field can check and provide feedback on whether certain sites can be removed from the list. Would be good to sort the LIDs by RFC and then alphabetical order.

15: New issue: Make a CatFIM folder to store the CatFIM

@RobHanna-NOAA
Copy link
Contributor

For Test and fix OR remove the lid choice system (arguments of -l / --lid_to_run). It has been broken since winter 2024. Could be handy but might not be worth the effort to fix it. -> Removed 9/11/24. I think we decided to drop that system in favor of finishing the filter by huc system (# 10?). I think it is not wise to leave in a feature that we don't know if it works especially if we are replacing it with # 10 hucs. I will put in a card to drop that feature.

@CarsonPruitt-NOAA
Copy link
Collaborator

This might be something we tackle after addressing all of this technical debt, but I wanted to track it here since it's a CatFIM task...

It's been suggested that we map Stage-Based CatFIM for sites that are not forecast sites. There are some AHPS points that are observation-only but still have flood thresholds that we could map. Even more sites for Stage-based (yay!).

@RobHanna-NOAA
Copy link
Contributor

Added separate card for future enhancement: 1308 Add lake and levee protected area masking

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CatFIM NWS Flood Categorical HAND FIM enhancement New feature or request Epic Agile epic. Subtasks are itemized and listed with check boxes, and individual cards/stories are made
Projects
None yet
Development

No branches or pull requests

3 participants