-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregate spatial #194
Aggregate spatial #194
Conversation
…o-processes-dask into aggregate_spatial
@masawdah in the example you mention a |
updated with geometries and bbox. |
) -> VectorCube: | ||
raise NotImplementedError | ||
t_dim = data.openeo.temporal_dims | ||
t_dim_name = t_dim[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Line 196, you check the length of t_dim, so it might make sense to also check for the length here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make sense to also add
t_dim_name = None if len(t_dim) == 0 else t_dim[0]
otherwise this line fails when the temporal dimension is not available. This would also need to be considered in Line 231 and 243.
LGTM, thank you for working on this! |
Code restructure
If you want to run the pre commit manually, you could |
|
||
## Create VectorCube | ||
vec_cube = xr.Dataset( | ||
data_vars=data_vars, coords=dict(geometry=df.geometry, time=times) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of naming the temporal dimension "time" per default, it makes sense to name it after the temporal dimension in the input data, what do you think?
I think, this could be achieved by something like
coords= {"geometry": df.geometry, t_dim: times}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it makes sense to keep the name as it is. Unfortunately, I currently don't have access to my VM due to internal maintenance. However, you can go ahead and implement the new modifications.
The name of time dimension has already been extracted in the beginning with t_dim_name = t_dim[0]
, so we can reuse it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was looking into it, but you were faster with your updates, I now also added a test for a dataset without time.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #194 +/- ##
==========================================
- Coverage 80.20% 79.42% -0.78%
==========================================
Files 30 30
Lines 1384 1468 +84
==========================================
+ Hits 1110 1166 +56
- Misses 274 302 +28 ☔ View full report in Codecov by Sentry. |
Thanks for working on this! I will now release a new openeo-processes-dask version and continue testing on larger scales! |
mask = rasterio.features.geometry_mask( | ||
geometry_array, out_shape=(y_dim_size, x_dim_size), transform=transform | ||
) | ||
|
||
if t_dim is not None: | ||
mask = np.expand_dims(mask, axis=data_dims.index(t_dim)) | ||
if b_dim is not None: | ||
mask = np.expand_dims(mask, axis=data_dims.index(b_dim)) | ||
|
||
masked_data = data * mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I got here from xarray-contrib/xvec#52 where @masawdah opened a PR with a variation of this.
The same code as is here was there as well but I believe it is wrong. mask
is a boolean array with False where the geometry_array
has been burned in, the rest is True. Line 159 results in an array where all the values are preserved except those where the geometry is, where the values are now 0 (not missing!).
Any reduction of this masked array it therefore likely wrong.
Even if you passed invert=True
to rasterio.features.geometry_mask
to get True indicating the location of geometry and False the area that should be taken away, the line 159 still keeps 0 where the geometry is not, therefore mean, median etc will be wrong. You would need to create a mask resulting in missing values here I believe.
I may be wrong as this is clearly reviewed PR but I don't think that is the case. The tests here also do not verify any correctness of values and this code indeed produces correctly formed cube. Just with completely wrong results.
I have refactored the logic in that xvec
PR and verified the result against other implementation so you can check how it is done there. Or we may be able to get the built-in functionality of xvec to the state where you can use it directly, if you dare to depend on version 0.2.0 of it :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martinfleis Thanks for pointing this out. Actually, this work was done simultaneously with xarray-contrib/xvec#52 . The intention behind it was to replace this effort with a built-in functionality of xvec once it is ready.
I saw your refactoring for the implementation, and it looks great. Let's double-check once again to confirm the best :)
@masawdah @ValentinaHutter, I also noticed something that needs to be adjusted.
|
I'm not sure if I missed something about the output format but it mentioned here #124 (comment) xarray.dataset. but it would be easy to switch. |
Please be aware that now the conversion from xarray.Dataset to xarray.DataArray is using the |
@clausmichele @ValentinaHutter I have tested the new implementation of the built-in functionality of xvec for spatial aggregation, and it performed well. It successfully handled the points mentioned by Michele. We can replace the existing functionality with this new implementation upon its release. Thanks @martinfleis |
@masawdah there's a new release of xvec including your contribution https://github.com/xarray-contrib/xvec/releases/tag/v0.2.0 |
Hi team,
We have completed the implementation of the aggregate_spatial process. You can use the code snippet below to test it. The current output format is an x-vector cube as an xarray dataset, but it can also be configured as a DatArray, depending on the requirements of processes utilizing the output, such as fit_regr_random_forest.
Here if the output is xr.Dataset
And this is as DataArray