-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initiate S3 Multi-part upload on receiving first event #318
base: main
Are you sure you want to change the base?
Initiate S3 Multi-part upload on receiving first event #318
Conversation
…, and closes the file on flush. Signed-off-by: Aindriu Lavelle <[email protected]>
…allowing changelog records to initiate multipart upload. Signed-off-by: Aindriu Lavelle <[email protected]>
b6cebcc
to
4776d6d
Compare
|
||
assertThat(expectedBlobs).allMatch(blobName -> testBucketAccessor.doesObjectExist(blobName)); | ||
|
||
assertThat(testBucketAccessor.readLines("prefix-topic0-0-00000000000000000012", compression)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an FYI, the S3MockApi does not create the file names correctly for key, value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, the S3OutputStream has had multipart upload for a long time: Aiven-Open/s3-connector-for-apache-kafka#73
But we were still buffering data as records, rather than offloading them early? Crazy. Thanks for the improvement.
* This determines if the file is key based, and possible to change a single file multiple times per flush or if | ||
* it's a roll over file which at each flush is reset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain more about this? What is key based grouping, and why does it mutate the file?
This update initiates the multipart upload as soon as a record begins, and closes the file on flush.
This PR does
This PR does not