Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to pass schema version as kafka property #15

Open
KrupalVanukuri opened this issue May 21, 2021 · 7 comments
Open

how to pass schema version as kafka property #15

KrupalVanukuri opened this issue May 21, 2021 · 7 comments

Comments

@KrupalVanukuri
Copy link

If schema has multiple versions, as a producer/consumer how to pass/refer particular version from kafka properties or kafka configuration ? Can I use something like below ?

consumer:
properties.put(KafkaAvroDeserializerConfig.SCHEMA_REGISTRY_URL_CONFIG, "https://myeventhubNamespace:443/$schemagroups/mygroupName/schemas/mySchemaName/versions/2");

producer:

 properties.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, "https://myeventhubNamespace:443/$schemagroups/mygroupName/schemas/mySchemaName/versions/2");

full method:

public KafkaTemplate<Object, Object> getKafkaTemplate() {
final Map<String, Object> properties = new HashMap<String, Object>();
properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootStrapServer);
properties.put("security.protocol", securityProtocol);
properties.put("sasl.mechanism", saslMechanism);
properties.put("sasl.jaas.config", saslJaasConfig);
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
com.microsoft.azure.schemaregistry.kafka.avro.KafkaAvroSerializer.class);

    properties.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, "https://myeventhubNamespace:443/                                                  $schemagroups/mygroupName/schemas/mySchemaName/versions/2");
properties.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_CREDENTIAL_CONFIG, credential); 
    properties.put(KafkaAvroSerializerConfig.SCHEMA_GROUP_CONFIG,"flightEventAvroSchemaGroup");
    final KafkaTemplate<Object, Object> template = new KafkaTemplate<Object, Object>(new DefaultKafkaProducerFactory<>(properties));
    return template;
}

Is this recommended? And does port no 443 will change on deleting and creating the EH namespace with same name ?

@KrupalVanukuri
Copy link
Author

Any suggestions please ?

@KrupalVanukuri
Copy link
Author

Any suggestions on how schema evolve works?

Created a schema group with Forward compatibility and uploaded a schema with initial version 1 in Azure portal. Now both producer and consumer applications are working fine.
Now I want to add a new filed to schema, So I did edit the schema in portal and added the default field. Version got increment to 2 with new GUID.
Now updated the producer application with version 2 schema and sent a message to EH. Message appended with version 2 GUID. When consumer application received this new message, its failed because consumer using version 1 and received message has version 2 GUID. But as per Forward compatibility, consumer should process this new message successfully by ignoring the default field. This feature is very important for our applications to use schema registry. Can you please provide any help on this? And any documentation and samples to refer ?

Version 2:
{
"type": "record",
"namespace": "com.test",
"name": "Employee",
"fields": [
{
"name": "firstName",
"type": "string"
},
{
"name": "middleName",
"type": [
"null",
"string"
],
"default" : null

    },
    {
        "name": "age",
        "type": "int"
    }
]

}

Version 1:

{
"type": "record",
"namespace": "com.test",
"name": "Employee",
"fields": [
{
"name": "firstName",
"type": "string"
},
{
"name": "age",
"type": "int"
}
]
}

@hmlam
Copy link
Member

hmlam commented Jun 9, 2021

The TL;DR answer to your question is that typically it is not recommended to pass schema version as Kafka properties, but rather, you should follow the typical Avro model and generate a versioned java class of your Avro schema, and let the internal mechanics of the schema registry to register the schema for you.

=========================
Since you are in the Kafka space, the integration with schema is typically done through code generation, rather than through portal experience (The portal experience gives you a good UI tool to look at a schema after it is written). I would suggest you run through our sample at https://github.com/Azure/azure-schema-registry-for-kafka/tree/master/java/avro/samples to give you a better understanding of how the integration is typically done.

In particular, in Kafka, typically produce/fetch using a schema is more or less about creating Avro classes for the given schema, rather than pointing a config to a schema endpoint. Your typical flow should be something like the following:

  1. Define your schema (let say AvroUser.avsc)
  2. Generate the java classes using the schema (most people use avro-maven-plugin to generate the java classes in their project)
  3. After step 2 you should now have a java class that encapsulate your schema (e.g. AvroUser.java), which you can use that class in your code in both producer and consumer to write your data using the schema - again, you should refer to our sample for details.
  4. Note: specific to Event Hub Schema Registry, you should always pre-create the schema group that you specified in schema.group config in portal before running your code/sample.

As for schema evolution, most people typically go through this kind of life cycle.

  1. Let say you start with AvroUser.avsc. Producer and Consumer uses that schema (the generated class) to produce records - since you configured to use our schema registry, the producer and consumer will register that schema into the registry using the namespace and name you specified in your avsc.
  2. At some point you want to create version 2 of AvroUser.avsc - you make modification to the schema, but keep the same namesapce and name, and let say you now have a file called AvroUserV2.avsc.
  3. When you compile your code, avro-maven-plugin will generate AvroUserV2.java which now you will use in your producer/consumer code. When you run your code, just like in step 1, the library will automatically register that new schema with the registry - because the namespace and name are the same, it will now know this is the new version.

Note that since each data you produce and fetch are all embeded with the schema id, so data produce/fetch in step (1) will have a different schema id than data produce/fetch in step (3) - This is how the data can still be de/serialized even though schema has evolved. Also note that your generated java files (AvroUser.java and AvroUserV2.java) is basically the code representation of your schema evolution.

@KrupalVanukuri
Copy link
Author

KrupalVanukuri commented Jun 9, 2021 via email

@KrupalVanukuri
Copy link
Author

When I look at the below link, I see DatumReader created based on writer schema. I believe to deserialize the payload/message, DatatumReader should consider ReaderSchema.
https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/schemaregistry/azure-data-schemaregistry-avro/src/main/java/com/azure/data/schemaregistry/avro/AvroSchemaRegistryUtils.java

Method Name:
getDatumReader(Schema writerSchema)

@hmlam
Copy link
Member

hmlam commented Jun 14, 2021

Your repro looks correct to me as a forward compatibility sample. I would suggest filing a github issues over at the azure-sdk-for-java repo with your forward compatibility repro step, so that the SDK team can track this and fix it as needed.

@KrupalVanukuri
Copy link
Author

@hmlam any updates on this azure schema registry for kafka with Forward compatibility ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants