Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamoDB: DuplicateKeyException after resuming CDC #301

Open
amotl opened this issue Oct 24, 2024 · 3 comments
Open

DynamoDB: DuplicateKeyException after resuming CDC #301

amotl opened this issue Oct 24, 2024 · 3 comments

Comments

@amotl
Copy link
Member

amotl commented Oct 24, 2024

About

We are observing a problem with the sync lambda.

Problem

Every once in a while a task will time out, which leads to the sync lambda retrying. Now, these retries fail, because, even though the original task had timed out, the data has been correctly stored into CrateDB, for all the items that are part of the batch that has timed out. Do you have any idea what could be causing this behavior?

/cc @dfeokti

@amotl
Copy link
Member Author

amotl commented Oct 24, 2024

a) We will look into the resume logic if we can spot any bugs.
b) Other strategies to compensate for those situations are using upserts / on conflict ignore clauses on the insert statements. We might just employ this strategy here as well.

/cc @wierdvanderhaar, @hammerhead

@amotl
Copy link
Member Author

amotl commented Oct 24, 2024

a) Relating to that comment,

"""
Implement partial batch response for Lambda functions that receive events from
a Kinesis stream. The function reports the batch item failures in the response,
signaling to Lambda to retry those messages later.
"""

and how error handling is taking place,

except Exception as ex:
error_message = f"An error occurred processing event: {event_id}"
logger.exception(error_message)
if USE_BATCH_PROCESSING:
# Return failed record's sequence number.
return {"batchItemFailures": [{"itemIdentifier": cur_record_sequence_number}]}
if ON_ERROR == "exit":
# Signal "Input/output error" when error happens while processing data.
sys.exit(5)
elif ON_ERROR == "ignore":
pass
elif ON_ERROR == "raise":
raise ex

I guess the regular modus operandi for a Lambda that receives events from a Kinesis stream is that if the Lambda fails for whatever reason, recent events will be re-delivered. If it's multiple records, it is probably normal that some of them may be redundant, because they have been relayed to CrateDB successfully already.

b) I guess using ON CONFLICT IGNORE/UPDATE instead will be the right choice.

@amotl
Copy link
Member Author

amotl commented Oct 24, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant