Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flink: Maintenance - Add support for more kinds of scheduling #11246

Open
3 tasks
netvl opened this issue Oct 1, 2024 · 2 comments
Open
3 tasks

Flink: Maintenance - Add support for more kinds of scheduling #11246

netvl opened this issue Oct 1, 2024 · 2 comments
Labels
improvement PR that improves existing functionality

Comments

@netvl
Copy link

netvl commented Oct 1, 2024

Feature Request / Improvement

Current implementation of maintenance operations scheduling provides several options:

  • commit count
  • data file count
  • data file size
  • positional delete file count
  • positional delete record count
  • equality delete file count
  • equality delete record count
  • elapsed interval between runs

It would be great if there was a way to use more powerful scheduling options, for example, for a cron-like scheduler where I can specify that I want to run this particular maintenance operation on 15th minute of every hour or something like it.

Note that external scheduling does not work for us really, because we don't have a simple way to schedule batch Flink applications (and wouldn't want to, as it will require a whole separate application, which is precisely what we want to avoid).

In general, it would be great to have a hook into the scheduling system to provide a custom source of trigger events. This will also help with testing of maintenance-based features of our application; for example, with a custom scheduler, I could've been able to use an external trigger (e.g. an HTTP call, or a blocking queue notification) to fully control when maintenance operations are executed.

Query engine

Flink

Willingness to contribute

  • I can contribute this improvement/feature independently
  • I would be willing to contribute this improvement/feature with guidance from the Iceberg community
  • I cannot contribute this improvement/feature at this time
@netvl netvl added the improvement PR that improves existing functionality label Oct 1, 2024
@netvl
Copy link
Author

netvl commented Oct 1, 2024

cc @pvary

@pvary
Copy link
Contributor

pvary commented Oct 2, 2024

Cc: @stevenzwu, @rodmeneses

@netvl: The scheduling currently doesn't guarantee the exact time of the execution. If there is a concurrent maintenance run then it will wait until the concurrent run has finished.

Would this restriction acceptable for your use case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement PR that improves existing functionality
Projects
None yet
Development

No branches or pull requests

2 participants