Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamoDB: Use ON CONFLICT DO NOTHING clause on INSERT CDC operations #77

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

amotl
Copy link
Member

@amotl amotl commented Oct 24, 2024

Reason

... to mitigate errors when events are relayed redundantly from retries after partially failed batches on the Lambda processor.

References

/cc @kneth, @surister, @hlcianfagna

@cla-bot cla-bot bot added the cla-signed label Oct 24, 2024
@amotl amotl marked this pull request as ready for review October 24, 2024 07:58
@amotl
Copy link
Member Author

amotl commented Oct 24, 2024

Discussion Summary

Q&A

  1. If you think this change will apply the correct solution to the corresponding problem, or if it would cause other flaws, if that option should be made configurable, or if it is sane to use as a default particularly on the CDC INSERT operations.
  2. While already at it, do you think that clause should also be applied to other spots in the relevant subsystems of CTK? a) MongoDB CDC? b) Optionally also on full-load operations, both DynamoDB and MongoDB?

Do you think it is applicable then to use the upsert on all three occasions, which are
a) INSERT statements of full load operations,
b) INSERT statements of cdc operations,
c) UPDATE statements of cdc operations.

Do nothing on DuplicateKeyExceptions

@wierdvanderhaar already responded:

I think the upsert could work for all three. Only the Update needs to be transformed into an INSERT INTO .. ON CONFLICT DO UPDATE SET.

@hammerhead added:

Regarding c), there is still one semantical difference between an UPDATE and an INSERT ON CONFLICT: If there is no row matching the UPDATE, it will have no effect. If that statement gets re-written as an INSERT ON CONFLICT, the record gets inserted. Not sure if that is a problem or not, but at least it is a difference in behaviour. Personally, I would recommend to stick to using UPDATE.

Overwrite Option

@amotl asked:

I can imagine a user toggling a flag to also make it apply on a) could enable a convenience case where you can just slap a full load operation on a table that is already populated, with thinking much about needing to drop it first. That would be a cost vs. convenience tradeoff.

@hammerhead responded:

Some kind of “override if exists” flag for a) and b) could be interesting, we had a similar question the other day.

... to mitigate errors when events are relayed redundantly from retries
after partially failed batches on the Lambda processor.
@amotl amotl force-pushed the dynamodb-cdc-insert-conflict-noop branch from 928a99a to 671b47d Compare October 24, 2024 09:07
Copy link
Member

@hammerhead hammerhead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense I believe, thanks!

@kneth
Copy link

kneth commented Oct 24, 2024

@amotl Is it possible to log the conflicts? I could imagine that the user would like to know how often it happens and maybe see if there is a pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants