S3-SQS source does not populate partition columns in the dataframne #2

DipeshV · 2020-06-18T13:47:35Z

Hi,
I are using this "s3-sqs" connector with spark structured streaming and deltalake to process incoming data in partitioned s3 buckets.
The problem I are facing is with "s3-sqs" source is that the file is directly read and returns a dataframe/dataset without the partition columns.
Hence, when we merge the source and target dataframes, we get all the partition columns as HIVE_DEFAULT_PARTITION.

Do have any solution/workaround to add partition colums as a part of dataframe??

Thanks and regards,
Dipesh Vora

abhishekd0907 · 2020-06-23T12:16:51Z

@DipeshV seems like a bug.
Thanks for pointing this out. I will work on the fix.

DipeshV · 2020-06-30T07:13:57Z

Hi Abhishek,

I am currently adding partition manually, which makes my code a bit messy and cannot be used as is while adding new integrations.
Do we have any fix for this?

Thanks,
Dipesh

abhishekd0907 · 2020-07-01T06:14:42Z

@DipeshV yeah i'll raise a PR for the fix today.

abhishekd0907 · 2020-07-01T10:10:44Z

@DipeshV I've created a pull request. Can you build a jar from the new branch and try it out?

abhishekd0907 · 2020-07-17T13:18:27Z

@DipeshV Did you get a chance to try out the new code? Does it solve your use case?

DipeshV · 2020-07-27T07:30:01Z

@abhishekd0907 - I haven't checked the new code since I had currently manually added the partitions from input_file_name().
But I will test it though with the new code.

abhishekd0907 mentioned this issue Jul 1, 2020

adding support for partitioned s3 source #4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3-SQS source does not populate partition columns in the dataframne #2

S3-SQS source does not populate partition columns in the dataframne #2

DipeshV commented Jun 18, 2020

abhishekd0907 commented Jun 23, 2020 •

edited

Loading

DipeshV commented Jun 30, 2020

abhishekd0907 commented Jul 1, 2020

abhishekd0907 commented Jul 1, 2020

abhishekd0907 commented Jul 17, 2020

DipeshV commented Jul 27, 2020

S3-SQS source does not populate partition columns in the dataframne #2

S3-SQS source does not populate partition columns in the dataframne #2

Comments

DipeshV commented Jun 18, 2020

abhishekd0907 commented Jun 23, 2020 • edited Loading

DipeshV commented Jun 30, 2020

abhishekd0907 commented Jul 1, 2020

abhishekd0907 commented Jul 1, 2020

abhishekd0907 commented Jul 17, 2020

DipeshV commented Jul 27, 2020

abhishekd0907 commented Jun 23, 2020 •

edited

Loading