-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add ConsolidateDimensionCoordinates + ConsolidateZarrMetadata transforms #556
Changes from all commits
d39c6ef
83aa4fb
cb91a89
eddcfb7
2f5555d
e31ddfb
bcf7190
8db820b
e6f8c03
9b2d903
bcf23f5
e014aae
1a0b147
6d14d20
713394a
abdbaf8
4c935b2
c827037
fb0ba62
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,6 +8,7 @@ | |
import numpy as np | ||
import pytest | ||
import xarray as xr | ||
import zarr | ||
from apache_beam.options.pipeline_options import PipelineOptions | ||
from apache_beam.testing.test_pipeline import TestPipeline | ||
from fsspec.implementations.reference import ReferenceFileSystem | ||
|
@@ -58,11 +59,13 @@ def test_xarray_zarr( | |
xr.testing.assert_equal(ds.load(), daily_xarray_dataset) | ||
|
||
|
||
def test_xarray_zarr_subpath( | ||
@pytest.mark.parametrize("consolidate_coords", [True, False]) | ||
def test_xarray_zarr_consolidate_coords( | ||
daily_xarray_dataset, | ||
netcdf_local_file_pattern_sequential, | ||
pipeline, | ||
tmp_target_url, | ||
consolidate_coords, | ||
): | ||
pattern = netcdf_local_file_pattern_sequential | ||
with pipeline as p: | ||
|
@@ -74,11 +77,16 @@ def test_xarray_zarr_subpath( | |
target_root=tmp_target_url, | ||
store_name="subpath", | ||
combine_dims=pattern.combine_dim_keys, | ||
consolidate_coords=consolidate_coords, | ||
) | ||
) | ||
# TODO: This test needs to check if the consolidate_coords transform | ||
# within StoreToZarr is consolidating the chunks of the coordinates | ||
|
||
ds = xr.open_dataset(os.path.join(tmp_target_url, "subpath"), engine="zarr") | ||
xr.testing.assert_equal(ds.load(), daily_xarray_dataset) | ||
store = zarr.open(os.path.join(tmp_target_url, "subpath")) | ||
|
||
# fails | ||
assert netcdf_local_file_pattern_sequential.dims["time"] == store.time.chunks[0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @norlandrhagen what is the rationale behind this
Additionally, in your view is the fact that this is currently failing evidence that the current implementation is wrong? Or that this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @cisaacstern I think it's the the assert isn't checking the right thing. |
||
|
||
|
||
@pytest.mark.parametrize("output_file_name", ["reference.json", "reference.parquet"]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'd prefer composition over configuration here. Is there a chance we can just expose the
ConsolidateMetadata
transform from the module.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting @sbquinlan. Is there an example that could illustrate the advantage of exposing these stages? Just looking at this I would think this looks good since in almost all my cases I would want to have those two steps happening by default, but I might be missing something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbusecke My point is basically I prefer:
To:
Hopefully that makes sense. @norlandrhagen already landed changes to make the former possible so that was very much appreciated and unblocks me. So I don't really have much of a strong opinion here anymore. If we want to provide "one-size-fits-all" transforms with
StoreToZarr
/WriteCombinedReferences
that are configurable with boolean flags for people less familiar with Beam or Zarr internals, that's fine, as long as power users still have access to the individual transforms to customize as they see fit. Like if you need to tweak a step for CMIP6.Again, no blocking feedback from me on this PR. Just need to figure out the tests and we're good to go. Might be work breaking up metadata and coords just to cut down on PR complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think putting it that way makes a lot of sense @sbquinlan.