-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why are coordinates being chunked by default with StoreToZarr
?
#554
Comments
Thanks for raising this, @abarciauskas-bgse. Based on the notebook it looks like you are using a development version of pangeo-forge-recipes that predates the 0.10.0 release? I don't have any specific reason to think this is fixed in 0.10.0, but could you confirm that the issue is present there as well? |
Thanks for looking @cisaacstern - I just installed pangeo-forge-recipes 0.10.0 and verified the coordinates are also chunked when using that version of the library (code linked in notebook has been updated to reflect this library version). |
Thanks for checking that, @abarciauskas-bgse. AFAICT, this feature did exist in pangeo-forge-recipes/pangeo_forge_recipes/recipes/xarray_zarr.py Lines 665 to 692 in 3b3c13c
And re-implementing this as part of the Beam rewrite was simply overlooked. Adding this would probably mean defining a new transform class ConsolidateDimensionCoordinates(beam.PTransform):
... and slotting it in (optionally, perhaps, but as the default?) at the end of target_store = schema | PrepareZarrTarget(
target=self.get_full_target(), target_chunks=self.target_chunks
)
- return rechunked_datasets | StoreDatasetFragments(target_store=target_store)
+ stored_fragments = rechunked_datasets | StoreDatasetFragments(target_store=target_store)
+ return (
+ stored_fragments
+ if not self.consolidate_coords
+ else stored_fragments | ConsolidateDimensionCoordinates(target_store=target_store)
+ ) I could potentially take a look at this on the few weeksish time horizon from now, but would be thrilled if you or someone else wanted to take this on, and would be happy to do async and/or video walkthrough to get you (or whoever wanted to work on this) up to speed. Thanks for catching this! |
Thank you @cisaacstern for looking into this! |
👋🏽 I'm using pangeo-forge to create some Zarr stores to test with rio_tiler's XarrayReader. I found that when generating image tiles, performance was significantly impact by the number of S3 calls when opening the dataset and this was because the coordinates were chunked. The code to reproduce this issue is here: https://nbviewer.org/gist/abarciauskas-bgse/c826a49a966f3c157626f45ea816330b
The text was updated successfully, but these errors were encountered: