Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure exactly-once on connector task(w/ coordinator) rebalancing, as like Apache version #280

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

okayhooni
Copy link

@okayhooni okayhooni commented Jul 28, 2024

re-opening PR #279 with additional defense logic authored by @bryanck

Context

I found that duplicated records occurred on the CDC sink with this Iceberg sink connector after using spot nodes and activating the node consolidation feature of Karpenter. Although it happens very rarely, when it does occur, it tends to happen consecutively. In a related issue inquiry, @bryanck informed me that in the Iceberg version of the connector, safeguard logic has been added to ensure that no more than one coordinator task is running simultaneously during the connector rebalancing process.

Commit Contents

  • cherry-pick this safeguard logic from Apache version to the Tabular version as well.
  • the safeguard logic is designed to prevent more than one coordinator task from running simultaneously during connector task rebalancing, such as in cases where spot instances are terminated.

Related Links

cc/ @fqtab

@okayhooni
Copy link
Author

@fqaiser94 ( @fqtab )

Could you review this commit and related discussions on the #kafka-connect channel of apache-iceberg Slack community..?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant