Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Pass on on-disk chunk sizes as preferred chunk sizes to the xarray backend #678

Merged
merged 10 commits into from
Aug 14, 2023

Conversation

mraspaud
Copy link
Contributor

This PR proposes a removes the (unused) chunks and cache argument from the open_dataset plugin function and sends values to open_rasterio to get a result inline with what xarray expects. Moreover, the preferred_chunks encoding attribute is populated for xarray to be able to do optimized automated chunking.

The fact that chunks and cache are ignored by xarray is documented in the example here: https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html#backendentrypoint-subclassing
And the recommended usage of preferred_chunks is documented here: https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html#preferred-chunk-sizes

Finally, I wasn't really sure how/where to put the tests, what do you recommend?

  • Closes #xxxx
  • Tests added
  • Fully documented, including docs/history.rst for all changes and docs/rioxarray.rst for new API

@snowman2
Copy link
Member

chunks is not used by xarray, however it is used by rioxarray. See:

rioxarray/rioxarray/_io.py

Lines 896 to 929 in 33c0059

def _prepare_dask(
result: DataArray,
riods: RasterioReader,
filename: Union[str, os.PathLike],
chunks: Union[int, tuple, dict],
) -> DataArray:
"""
Prepare the data for dask computations
"""
# pylint: disable=import-outside-toplevel
from dask.base import tokenize
# augment the token with the file modification time
try:
mtime = os.path.getmtime(filename)
except (TypeError, OSError):
# the filename is probably an s3 bucket rather than a regular file
mtime = None
if chunks in (True, "auto"):
from dask.array.core import normalize_chunks
if not _DASK_GTE_018:
raise NotImplementedError("Automatic chunking requires dask >= 0.18.0")
block_shape = (1,) + riods.block_shapes[0]
chunks = normalize_chunks(
chunks=(1, "auto", "auto"),
shape=(riods.count, riods.height, riods.width),
dtype=_rasterio_to_numpy_dtype(riods.dtypes),
previous_chunks=tuple((c,) for c in block_shape),
)
token = tokenize(filename, mtime, chunks)
name_prefix = f"open_rasterio-{token}"
return result.chunk(chunks, name_prefix=name_prefix, token=token)

rioxarray/_io.py Outdated Show resolved Hide resolved
rioxarray/_io.py Outdated Show resolved Hide resolved
rioxarray/_io.py Outdated Show resolved Hide resolved
.pre-commit-config.yaml Outdated Show resolved Hide resolved
@mraspaud
Copy link
Contributor Author

chunks is not used by xarray, however it is used by rioxarray. See:

Indeed! my objective here is to return the LazilyIndexedOuterArrays to xarray directly when the data is requested through the xarray backend to let xarray deal with the chunking and create dask arrays itself.

@mraspaud
Copy link
Contributor Author

mraspaud commented Jul 4, 2023

Anything else I can do to get this PR ready to merge?

@snowman2
Copy link
Member

snowman2 commented Jul 4, 2023

Mind adding a note in docs/history.rst?

@mraspaud
Copy link
Contributor Author

mraspaud commented Jul 4, 2023

Mind adding a note in docs/history.rst?

Done in 1808857

@snowman2
Copy link
Member

@mraspaud, the only thing left to do in this PR is to remove the changes in the .pre-commit-config.yaml file and resolve the conflicts. The optimal chunk size can be updated separately.

@mraspaud
Copy link
Contributor Author

Done. Sorry for the delay, I was on holidays...

@snowman2 snowman2 changed the title Fix xarray plugin for automatic chunk size ENH: Pass on on-disk chunk sizes as preferred chunk sizes to the xarray backend Aug 11, 2023
@snowman2 snowman2 added the enhancement New feature or request label Aug 11, 2023
@snowman2 snowman2 added this to the 0.14.1 milestone Aug 11, 2023
@snowman2
Copy link
Member

This looks good from my perspective, Any other changes you would like to add before merging?

@mraspaud
Copy link
Contributor Author

Nothing more from my side, thanks for all the support!

@snowman2
Copy link
Member

Thanks @mraspaud 👍

@snowman2 snowman2 merged commit 2428b0e into corteva:master Aug 14, 2023
18 of 19 checks passed
@mraspaud mraspaud deleted the fix-xarray-plugin-chunks branch August 14, 2023 13:32
@snowman2 snowman2 modified the milestones: 0.14.1, 0.15.0 Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants