Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with schema converter when encountered with specific type of schema #645

Open
SimplytheVinay opened this issue Aug 1, 2024 · 0 comments

Comments

@SimplytheVinay
Copy link

SimplytheVinay commented Aug 1, 2024

Bug
Issue with schema Converter when encountered with specific type of schema

Steps to reproduce the behavior:

  1. Check following local code which i am wrote to replicate behaviours we faced in Dev. I am passing avro file and using schema converter method
  2. convertPulsarAvroSchemaToNonNullSchema of SchemaConverter.java gets called with schema
  3. convertPulsarAvroSchemaToNonNullSchema, loops through the fields and calls convertOneField with each field name with schema
  4. convertOneField method of SchemaConverter.java class take each field and converts the data
  5. After going through the avro file when it encounters a specific type of union it goes into below code block and gives the error in else block
    https://github.com/streamnative/pulsar-io-lakehouse/blob/master/src/main/java/org/apache/pulsar/ecosystem/io/lakehouse/common/SchemaConverter.java#L153
  6. Here is the schema which gives error:
    ["null",{"type":"map","values":["null","int","long","string","boolean","float","double"]},"int","long","string","boolean","float","double"]
  7. See error

Expected behavior
This utility should handle such schema as well. While using the same avro file through some other online utilities, I was able to get proper JSON file with data.
https://dataconverter.io/convert/avro-to-json

Screenshots
Test code written to debug the failure locally
Screenshot 2024-08-01 at 12 16 57 PM

Code section which gives error
Screenshot 2024-08-01 at 12 16 22 PM

Environment

  • OS: Mac and Ubuntu
  • Pulsar version: 3.3.0.5
  • Deployment: standalone and development env with pulsar and hudi connected

Additional context
This conversion is being done to source data from pulsar topic and to load it in HUDI. Another interesting thing is Pulsar storage sink connector is able to load this data in Bigquery without any issues.

@SimplytheVinay SimplytheVinay changed the title Issue with schema writer when encountered with specific type of schema Issue with schema converter when encountered with specific type of schema Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant