Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3-SQS source does not populate partition columns in the dataframne #2

Open
DipeshV opened this issue Jun 18, 2020 · 6 comments
Open

Comments

@DipeshV
Copy link

DipeshV commented Jun 18, 2020

Hi,
I are using this "s3-sqs" connector with spark structured streaming and deltalake to process incoming data in partitioned s3 buckets.
The problem I are facing is with "s3-sqs" source is that the file is directly read and returns a dataframe/dataset without the partition columns.
Hence, when we merge the source and target dataframes, we get all the partition columns as  HIVE_DEFAULT_PARTITION.

Do have any solution/workaround to add partition colums as a part of dataframe??

Thanks and regards,
Dipesh Vora

@abhishekd0907
Copy link
Collaborator

abhishekd0907 commented Jun 23, 2020

@DipeshV seems like a bug.
Thanks for pointing this out. I will work on the fix.

@DipeshV
Copy link
Author

DipeshV commented Jun 30, 2020

Hi Abhishek,

I am currently adding partition manually, which makes my code a bit messy and cannot be used as is while adding new integrations.
Do we have any fix for this?

Thanks,
Dipesh

@abhishekd0907
Copy link
Collaborator

@DipeshV yeah i'll raise a PR for the fix today.

@abhishekd0907
Copy link
Collaborator

@DipeshV I've created a pull request. Can you build a jar from the new branch and try it out?

@abhishekd0907
Copy link
Collaborator

@DipeshV Did you get a chance to try out the new code? Does it solve your use case?

@DipeshV
Copy link
Author

DipeshV commented Jul 27, 2020

@abhishekd0907 - I haven't checked the new code since I had currently manually added the partitions from input_file_name().
But I will test it though with the new code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants