-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial PoC implementation of UDPJobFactory #644
Conversation
31d0c76
to
ff8b553
Compare
seems like pandas is handy to build foot guns maybe related to #641
0f7a9bc
to
130db87
Compare
Ok I think this PR is ready for final review and/or merging. the diff is quite large, but that is mainly because of testing and testing utility improvements. The core of this PR is the new openeo-python-client/openeo/extra/job_management.py Lines 941 to 1126 in 2667733
user-facing usage (as documented): from openeo.extra.job_management import (
MultiBackendJobManager,
create_job_db,
UDPJobFactory,
)
# Job creator, based on a parameterized openEO process definition
job_starter = UDPJobFactory(
namespace="https://example.com/my_process.json",
)
# Initialize job database from dataframe, with parameters to use.
df = pd.DataFrame(...)
job_db = create_job_db("jobs.csv").initialize_from_df(df)
# Create and run job manager
job_manager = MultiBackendJobManager(...)
job_manager.run_jobs(job_db=job_db, start_job=job_starter) |
Also note that this is inspired by PR ESA-APEx/esa-apex-toolbox-python#1 as suggested in #604 That PR ESA-APEx/esa-apex-toolbox-python#1 was however outdated due to recent job manager changes and not really mergeable as-is. |
Just found out there was actually another PR about this: interesting idea from that PR to add here:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code wise this looks quite fine, especially for an MVP.
For API testing, the opinions of users could be valuable, or maybe data engineering.
@HansVRP Feel free to provide feedback.
Otherwise, I suggest to merge it, so that we can work it into the apex demo, and then also gain some experience ourselves.
Will review this sprint. Any deadline I need to be aware of? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some initial comments; still going through the remainder
openeo/extra/job_management.py
Outdated
|
||
class UDPJobFactory: | ||
""" | ||
Batch job factory based on a parameterized process definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether UDP Job Factory is the clearest name. Without having read the function definition, I am not really sure what to expect. The definition does not necessarily clarify it either.
Does factory here mean that we create jobs based on UDPs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I'm also not very happy with the current name, but haven't found something significant better yet
some ideas
UDPJobCreator
(but I'd actually prefer to avoid "UDP")ParameterizedProcessBasedJobCreator
is quite long,ProcessBasedJobCreator
is doableProcessParameterFiller
,ProcessParameterInjector
,ProcessTemplate
(are quite cryptic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already decided to pick ProcessBasedJobCreator, which should already improve the situation.
but still open for better options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that option sounds good to me.
Process in itself is also still a bit vague, but preferable over openeoprocessjobcreator
Thanks a lot for the review notes! I worked some more on the documentation part to clarify some things. Regardless, I'd propose to already merge this, because it is a large PR in terms of lines of code, so it's easy to get conflicts with other PRs. Functionality-wise it just adds something new, which I made sure to flag clearly as experimental, to give us some wiggle room to clean up later |
2030fac
to
7bf73de
Compare
for issue #604
self-review PR