You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have this case where we are encoding over 1000 messages at a time, but I noticed a bottleneck where AvroTurf takes over 60% of the CPU time, so I profiled the code and found out that the schema is being parsed for every message as noticed in below code, is this intentional? is there anyway to improve it ?
note: even in fetch_schema_by_id the method @schemas_by_id is not being set anywhere.
appreciate any help thank you :)
# Providing subject and version to determine the schema,
# which skips the auto registeration of schema on the schema registry.
# Fetch the schema from registry with the provided subject name and version.
def fetch_schema(subject:, version: 'latest')
schema_data = @registry.subject_version(subject, version)
schema_id = schema_data.fetch('id')
schema = Avro::Schema.parse(schema_data.fetch('schema'))
[schema, schema_id]
end
# Fetch the schema from registry with the provided schema_id.
def fetch_schema_by_id(schema_id)
schema = @schemas_by_id.fetch(schema_id) do
schema_json = @registry.fetch(schema_id)
Avro::Schema.parse(schema_json)
end
[schema, schema_id]
end
The text was updated successfully, but these errors were encountered:
Diyaa1
changed the title
Any way to improve encoding performance for messages batches when using AvroTurf::Messaging
a way to improve encoding performance for messages batches when using AvroTurf::Messaging
Feb 18, 2024
Thanks @dasch, I moved with patching avro_turf using below in production.
I can do a PR, are below changes acceptable ? I'll do it when I get free time.
class AvroTurf
class Messaging
def fetch_schema(subject:, version: 'latest')
schema_data = @registry.subject_version(subject, version)
schema_id = schema_data.fetch('id')
schema = @schemas_by_id[schema_id] ||= begin
Avro::Schema.parse(schema_data.fetch('schema'))
end
[schema, schema_id]
end
def fetch_schema_by_id(schema_id)
schema = @schemas_by_id[schema_id] ||= begin
schema_json = @registry.fetch(schema_id)
Avro::Schema.parse(schema_json)
end
[schema, schema_id]
end
end
end
Hello there,
We have this case where we are encoding over 1000 messages at a time, but I noticed a bottleneck where AvroTurf takes over 60% of the CPU time, so I profiled the code and found out that the schema is being parsed for every message as noticed in below code, is this intentional? is there anyway to improve it ?
note: even in fetch_schema_by_id the method @schemas_by_id is not being set anywhere.
appreciate any help thank you :)
The text was updated successfully, but these errors were encountered: