Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support geometries in load_stac #121

Open
LukeWeidenwalker opened this issue Jun 19, 2023 · 4 comments
Open

Support geometries in load_stac #121

LukeWeidenwalker opened this issue Jun 19, 2023 · 4 comments

Comments

@LukeWeidenwalker
Copy link
Contributor

LukeWeidenwalker commented Jun 19, 2023

To cut out the training sites for UC8, we need the ability to restrict the load_stac process to load only data within certain geometries. This should be possible with:

  1. the intersects keyword on pystac_client.ItemSearch, to only get the relevant STAC items,
  2. followed by loading into pydata/sparse arrays and concatenating them to construct a sparse data cube to keep memory usage low.

A bonus objective would be for load_stac to figure out dynamically when it's a good idea to load sparse, vs dense.

I did a prototype for this (but loading with opendatacube) that looked like this, should be helpful as inspiration here:

def geometry_mask(geoms, geobox, all_touched=False, invert=False):
    return rasterio.features.geometry_mask(
        [geom.to_crs(geobox.crs) for geom in geoms],
        out_shape=geobox.shape,
        transform=geobox.affine,
        all_touched=all_touched,
        invert=invert,
    )

def load_collection_with_geometries(dc, geometries: list, query_dict):
    full_cube = dc.load('boa_sentinel_2', dask_chunks = {}, **query_dict).to_array()
    full_cube = full_cube.drop_vars("spatial_ref")

    geometries_df = gpd.GeoDataFrame({"geometry": gpd.GeoSeries(geometries)}, crs="EPSG:4326")

    
    def load_subcube(geom, query_dict):
        sub_query = query_dict.copy()
        sub_query["geopolygon"] = geom
        sub_cube = dc.load('boa_sentinel_2', dask_chunks = {}, **sub_query).to_array()
        sub_cube = sub_cube.drop_vars("spatial_ref")

        mask = geometry_mask([geom], sub_cube.geobox, invert=True)

        sub_cube_data = sub_cube.where(mask != 0).data
        sub_cube_xr = xr.DataArray(coords=sub_cube.coords, data=sub_cube_data, name="sub_cube")
        return sub_cube_xr

    combined_xr = None
    sub_cubes = []
    for sub_geom in geometries_df.geometry:
        sub_cube = load_subcube(geometry.Geometry(geom=sub_geom, crs=geometries_df.crs), query_dict)
        sub_cubes.append(sub_cube)
        if combined_xr is None:
            combined_xr = sub_cube
        else:
            combined_xr = xr.combine_by_coords([combined_xr, sub_cube])
    combined_xr = combined_xr.to_array("subcube").squeeze()
    with dask.config.set(**{'array.slicing.split_large_chunks': True}):
        reindexed_combined = combined_xr.reindex_like(full_cube)
        sparse_combined = xr.DataArray(data=reindexed_combined.data.map_blocks(sparse.COO, fill_value=np.nan), coords=reindexed_combined.coords)
    sparse_combined.rio.write_crs(full_cube.rio.crs, inplace=True)
    return sparse_combined
@LukeWeidenwalker
Copy link
Contributor Author

@clausmichele do you have bandwidth to help out with this at the moment?

@clausmichele
Copy link
Member

Hi @LukeWeidenwalker, unfortunately I don't have time to work on this soon. I will try it when I have some spare time!

@clausmichele
Copy link
Member

@ValentinaHutter is this feature implemented?

@ValentinaHutter
Copy link
Collaborator

Sorry, I just realized that I misread this and thought of it as a requirement for UC8 only - where, for UC8, we now implemented something else and still load bounding boxes.

But in general, we should of course keep this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants