-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roadmap towards UC8 #124
Comments
@LukeWeidenwalker my colleagues are working on the aggregate functionality: ks905383/xagg#48 |
Some open points from the coordination!
|
Concerning larger scale datasets:
|
@masawdah implemented filter_spatial in this PR: #170 |
I now did some tests with |
Hi @ValentinaHutter , aggregate_spatial implemented in this PR: #194 As discussed with @clausmichele , if you plan to proceed with the implementation of |
Thanks a lot for the quick reply! will have a look at the PR and test it on our side asap! |
There is now a general implementation for aggregate spatial on openeo-processes-dask. However, we found that the spatio-temporal extent for the UC8 needs to be treated differently. This is because the UC8 starts with load_collection on bbox = [3, 43, 18, 51], temporal_extent = ["2018-05-01", "2018-09-01"]). This fails for the datasets, that are provided by EODC in the EODC backend. The error is raised as soon as dask tries to create a lazy array, for either having to much data in one chunk or having a task graph that is too large. The kernel crashes even before the aggregate_spatial is started. Let me know, if anything is unclear or you need a more detailed description of the issue. |
@ValentinaHutter in my opinion the re-implementation/adaptation would also be important to be shared, so that we understand the internal logic. Additionally, if someone would face the same issue, it would be important to track a possible solution to it. |
I will try to sum it up in the next days - I think this might be a nice thing to discuss at our first openeo-processes-dask meeting next Monday :) |
@clausmichele @aljacob I wanted to follow up on our call last Friday with a roadmap of concrete steps we'll need to finally close UC8.
Last year, because of the way the architecture was set up and there was no local execution environment, we could only test the whole workflow within the entire running system, so it was difficult to collaborate on this outside of EODC. With the rewrites to the parser and the processes, this should now be much easier, because we can essentially prototype the entire workflow without needing to have it exposed in a fully-fledged backend. As discussed on Friday, I'm summing up here what's still missing, so that @clausmichele can start supporting us again!
Subissues:
geometries
parameter inload_collection
(orload_stac
) by loading sparse arrays (https://github.com/pydata/sparse) (Implement load_stac #120, Supportgeometries
inload_stac
#121)aggregate_spatial
. I'm attaching my proof of concept notebook for how to get around this by leveraging pydata/sparse arrays (super crude - @clausmichele please skim this quickly and let me know if you can work with this already, or if I need to tidy it up and add some more comments!). This has to be done at the point where data is loaded initially, because otherwise any operations between load_collection and aggregate_spatial will run over an extent much larger than necessary.load_stac
, the rest of the pipeline should be unlocked to be worked on, even if this isn't live at EODC yet!aggregate_spatial
to work with sparse arrays and produce vector cubes with https://github.com/xarray-contrib/xvec (Port aggregate_spatial to work with sparse arrays and produce vector cubes with xvec #119)if isinstance(data, gpd.GeoDataFrame):
statements all over the place) - which would be annoying and make this unnecessarily hard to maintain!fit_regr_random_forest
to work on xvec vector cubes (Adaptfit_regr_random_forest
to work on xvec vector cubes #122)pred_regr_random_forest
to work with the new setup (Testpred_regr_random_forest
to work this new setup #123)Let me know if you have any questions on this general plan! As I explained on Friday, we don't have bandwidth to tackle these issues ourselves for at least another 2-week increment, so any progress you're able to make on this will help us out a lot and will be much appreciated. I'll try to make myself available to support and review any pull requests in a timely manner!
cc @SerRichard @ValentinaHutter @christophreimer @bschumac
test_sparse_multipolygons.zip
The text was updated successfully, but these errors were encountered: