Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

materialize-{snowflake,databricks}: make deletions idempotent #2015

Merged
merged 1 commit into from
Oct 4, 2024

Conversation

williamhbaker
Copy link
Member

@williamhbaker williamhbaker commented Oct 3, 2024

Description:

The Snowflake and Databricks materializations rely on idempotent merge queries. Previously a repeated merge query with a root document field having a "delete" sentinel value would cause that row to be re-added to the table after having first been deleted, in rare cases where the merge query is run a second time before the staged files are delete & before the runtime acknowledgement is sent.

This fixes that scenario by not inserting rows if the root document field is "delete" in merge queries.

I was able to manually test this by hacking up a version of both connectors that always fails on the first non-recovery commit it tries to run, and never cleans up any files. For both, I reproduced the insertion of a row with the "delete" document when the merge query with the deletion event was re-tried. And with this new code the "delete" row is not inserted in those situations. In reality, these conditions are rare but are possible, and will result in either inconsistent data in the destination, or worse a completely broken materialization since loading a document column that contains the string "delete" for an update will not work.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)


This change is Reviewable

The Snowflake and Databricks materializations rely on idempotent merge queries.
Previously a repeated merge query with a root document field having a `"delete"`
sentinel value would cause that row to be re-added to the table after having
first been deleted, in rare cases where the merge query is run a second time
before the staged files are delete & before the runtime acknowledgement is sent.

This fixes that scenario by not inserting rows if the root document field is
`"delete"` in merge queries.
@williamhbaker williamhbaker merged commit 49874ce into main Oct 4, 2024
49 of 52 checks passed
@williamhbaker williamhbaker deleted the wb/idempotent-deletions branch October 4, 2024 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants