Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1pt] PR: Replace fiona with pyogrio #1077

Merged
merged 10 commits into from
Feb 16, 2024
Merged

[1pt] PR: Replace fiona with pyogrio #1077

merged 10 commits into from
Feb 16, 2024

Conversation

mluck
Copy link
Contributor

@mluck mluck commented Feb 13, 2024

Replace fiona with pyogrio to improve I/O speed. geopandas will use pyogrio by default starting with version 1.0. pyarrow was also added as an environment variable to further speedup I/O. As a result of the changes in this PR, fim_pipeline.sh runs approximately 10% faster. Closes #1072.

Changes

  • Pipfile: Upgraded geopandas from v0.12.2 to v0.14.3, added pyogrio, and fixed version of pyflwdir.
  • src/bash_variables.env: Added environment variable for pyogrio to use pyarrow

The remaining files were modified to add pyogrio:

  • data/
    • bathymetry/preprocess_bathymetry.py, ble/ble_benchmark/create_flow_forecast_file.py, esri.py, nld/levee_download.py, usgs/acquire_and_preprocess_3dep_dems.py, wbd/clip_vectors_to_wbd.py, wbd/preprocess_wbd.py, write_parquet_from_calib_pts.py: Added pyogrio and pyarrow
  • src/
    • add_crosswalk.py, associate_levelpaths_with_levees.py, bathy_rc_adjust.py, bathymetric_adjustment.py, buffer_stream_branches.py, build_stream_traversal.py, crosswalk_nwm_demDerived.py, derive_headwaters.py, derive_level_paths.py, edit_points.py, filter_catchments_and_add_attributes.py, finalize_srcs.py, make_stages_and_catchlist.py, mask_dem.py, reachID_grid_to_vector_points.py, split_flows.py, src_adjust_spatial_obs.py, stream_branches.py, subset_catch_list_by_branch_id.py, usgs_gage_crosswalk.py, usgs_gage_unit_setup.py, utils/shared_functions.py
  • tools/
    • adjust_rc_with_feedback.py, check_deep_flooding.py, create_flow_forecast_file.py, eval_plots.py, evaluate_continuity.py, evaluate_crosswalk.py, fimr_to_benchmark.py, find_max_catchment_breadth.py, generate_categorical_fim.py, generate_categorical_fim_flows.py, generate_categorical_fim_mapping.py, generate_nws_lid.py, hash_compare.py, inundate_events.py, inundation.py, make_boxes_from_bounds.py, mosaic_inundation.py, overlapping_inundation.py, rating_curve_comparison.py, rating_curve_get_usgs_curves.py, test_case_by_hydro_id.py, tools_shared_functions.py

Removals

Testing

fim_pipeline.sh ran approximately 10% faster on two HUCs (12040101 and 12090301). Inundation metrics (CSI and MCC were identical to dev (4.4.10.0).

HUC8 4.4.10.0 pyogrio pygrio + pyarrow
12040101 34:02 31:05 30:36
12090301 54:05 48:20 47:26

Issuer Checklist (For developer use)

You may update this checklist before and/or after creating the PR. If you're unsure about any of them, please ask, we're here to help! These items are what we are going to look for before merging your code.

  • Informative and human-readable title, using the format: [_pt] PR: <description>
  • Links are provided if this PR resolves an issue, or depends on another other PR
  • If submitting a PR to the dev branch (the default branch), you have a descriptive Feature Branch name using the format: dev-<description-of-change> (e.g. dev-revise-levee-masking)
  • Changes are limited to a single goal (no scope creep)
  • The feature branch you're submitting as a PR is up to date (merged) with the latest dev branch
  • pre-commit hooks were run locally
  • Any change in functionality is tested
  • Passes all unit tests locally (inside interactive Docker container, at /foss_fim/, run: pytest unit_tests/)
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • CHANGELOG updated with template version number, e.g. 4.x.x.x
  • Reviewers requested
  • Add yourself as an assignee in the PR as well as the FIM Technical Lead

Merge Checklist (For Technical Lead use only)

  • Update CHANGELOG with latest version number and merge date
  • Update the Citation.cff file to reflect the latest version number in the CHANGELOG
  • If applicable, update README with major alterations

@mluck mluck linked an issue Feb 13, 2024 that may be closed by this pull request
RobHanna-NOAA
RobHanna-NOAA previously approved these changes Feb 15, 2024
Copy link
Contributor

@RobHanna-NOAA RobHanna-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ran against the full unit test set of HUCs with the new docker container from here. It all ran perfectly well and all of the eval plots were identical to recent previous runs.

@CarsonPruitt-NOAA CarsonPruitt-NOAA merged commit 073e4a3 into dev Feb 16, 2024
1 check passed
@CarsonPruitt-NOAA CarsonPruitt-NOAA deleted the dev-pyogrio branch February 16, 2024 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[5pt] Replace Fiona with pyogrio
3 participants