load_stac: stac_items #501

jdries · 2024-04-16T17:19:55Z

Proposed Process ID: load_stac
Proposed Parameter Name: stac_items
Optional: yes, default: None

Context

load_stac is very popular for loading user defined data, but require the stac json to be available via http url.
In many cases, such a url is not available, and the user thus needs to rely on a 3rd party service (e.g. github) to upload the stac json.
I see also a use case for systems that require signed urls for data access, where the user first needs to sign urls using a secret key.

Description

stac_items, if provided, is an array of valid STAC Item object. The backend will load all assets in the provided items.

Data Type

array of objects

Additional changes

the other parameters would no longer need to be present if stac_items is provided directly.
Alternative option is of course to turn this into a separate 'load_stac_items' process, with a single parameter?

clausmichele · 2024-05-08T09:20:46Z

@jdries so, if I understand it correctly, you would like to directly pass the STAC items as json/text in the process graph instead of an URL? It could be a good idea!

From what I understand, if we integrate it in load_stac an user can provide:

A STAC Collection or Item URL: the data will be loaded as usual using also the provided query filters.
An array of objects (STAC Items), that can be generated client side, so that there's more control on what we want to load.

m-mohr · 2024-05-16T21:48:07Z

This can quickly become problematic. Many STAC Items don't have absolute URLs and then you can't load the data if the self url isn't set. Usually you can use the Item URL if no self url is given, but the URL is not available here as fallback. Also, the JSON size can explode quickly if people start to pass thousand of Items.

Generally, I think I'd prefer a separate process if at all.

jdries · 2024-05-17T06:00:35Z

it's indeed limited to cases where you use absolute url's and don't send thousands of items.
The use case is really a user that wants to point to a low number of files that are online somewhere, but don't have a corresponding stac item online.
In general, not all of our users have a STAC API or http service at hand where they manage to quickly upload some items.
The process graph also becomes more self-contained if it just includes the STAC metadata.

In fact, our new load_stac sample somewhat illustrates it:
https://github.com/Open-EO/openeo-community-examples/blob/main/python/LoadStac/load-stac-item-example.ipynb
at a given point, it says 'make sure you upload your item', that step is the tricky part.

m-mohr · 2024-05-17T10:09:42Z

The place that would allow users to do that is the openEO /files endpoints. That was the original intention that users could upload any related files such as GeoJSON, STAC, etc. there. Due to the lack of implementation we didn't push this through the processes either, but maybe we should to encourage it.

The other thing with the STAC example you linked to: Creating a STAC Item for this purpose seems "overkill".
You could easily just capture all information you need in a simpler format, I believe, i.e. just a list of assets:

{
    "ndvi": {
        "href:" tiff_url,
        "type": "image/tiff; application=geotiff; profile=cloud-optimized",
        "eo:bands": [ # REQUIRED: define the bands in the eo extension for openEO to be able to load it
            {
                "name": "NDVI-band",
            }
        ],
        "proj:epsg": src.crs.to_epsg(),
        "proj:shape": src.shape, # Caveat: this is [height, width] and not [width, height] if you want to set them yourself
        "proj:bbox": proj_bounds,
    }
}

I assume you don't need the geometry and the projected bbox is enough, but not sure.

Do we have an agreed consensus across providers what the STAC Items need to contain to be read (and maybe optional ones for more efficiency)?

And then I'm wondering, why not just: load_url(tiff_url, "GTiff", {bands: ["NDVI-band"], ...})?

jdries added minor enhancement labels Apr 16, 2024

m-mohr added new process and removed minor enhancement labels May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load_stac: stac_items #501

load_stac: stac_items #501

jdries commented Apr 16, 2024

clausmichele commented May 8, 2024

m-mohr commented May 16, 2024 •

edited

Loading

jdries commented May 17, 2024

m-mohr commented May 17, 2024 •

edited

Loading

load_stac: stac_items #501

load_stac: stac_items #501

Comments

jdries commented Apr 16, 2024

Context

Description

Data Type

Additional changes

clausmichele commented May 8, 2024

m-mohr commented May 16, 2024 • edited Loading

jdries commented May 17, 2024

m-mohr commented May 17, 2024 • edited Loading

m-mohr commented May 16, 2024 •

edited

Loading

m-mohr commented May 17, 2024 •

edited

Loading