You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current implementation of maintenance operations scheduling provides several options:
commit count
data file count
data file size
positional delete file count
positional delete record count
equality delete file count
equality delete record count
elapsed interval between runs
It would be great if there was a way to use more powerful scheduling options, for example, for a cron-like scheduler where I can specify that I want to run this particular maintenance operation on 15th minute of every hour or something like it.
Note that external scheduling does not work for us really, because we don't have a simple way to schedule batch Flink applications (and wouldn't want to, as it will require a whole separate application, which is precisely what we want to avoid).
In general, it would be great to have a hook into the scheduling system to provide a custom source of trigger events. This will also help with testing of maintenance-based features of our application; for example, with a custom scheduler, I could've been able to use an external trigger (e.g. an HTTP call, or a blocking queue notification) to fully control when maintenance operations are executed.
Query engine
Flink
Willingness to contribute
I can contribute this improvement/feature independently
I would be willing to contribute this improvement/feature with guidance from the Iceberg community
I cannot contribute this improvement/feature at this time
The text was updated successfully, but these errors were encountered:
@netvl: The scheduling currently doesn't guarantee the exact time of the execution. If there is a concurrent maintenance run then it will wait until the concurrent run has finished.
Would this restriction acceptable for your use case?
Feature Request / Improvement
Current implementation of maintenance operations scheduling provides several options:
It would be great if there was a way to use more powerful scheduling options, for example, for a cron-like scheduler where I can specify that I want to run this particular maintenance operation on 15th minute of every hour or something like it.
Note that external scheduling does not work for us really, because we don't have a simple way to schedule batch Flink applications (and wouldn't want to, as it will require a whole separate application, which is precisely what we want to avoid).
In general, it would be great to have a hook into the scheduling system to provide a custom source of trigger events. This will also help with testing of maintenance-based features of our application; for example, with a custom scheduler, I could've been able to use an external trigger (e.g. an HTTP call, or a blocking queue notification) to fully control when maintenance operations are executed.
Query engine
Flink
Willingness to contribute
The text was updated successfully, but these errors were encountered: