a way to improve encoding performance for messages batches when using AvroTurf::Messaging #206

Diyaa1 · 2024-02-18T10:22:06Z

Hello there,

We have this case where we are encoding over 1000 messages at a time, but I noticed a bottleneck where AvroTurf takes over 60% of the CPU time, so I profiled the code and found out that the schema is being parsed for every message as noticed in below code, is this intentional? is there anyway to improve it ?

note: even in fetch_schema_by_id the method @schemas_by_id is not being set anywhere.

appreciate any help thank you :)

    # Providing subject and version to determine the schema,
    # which skips the auto registeration of schema on the schema registry.
    # Fetch the schema from registry with the provided subject name and version.
    def fetch_schema(subject:, version: 'latest')
      schema_data = @registry.subject_version(subject, version)
      schema_id = schema_data.fetch('id')
      schema = Avro::Schema.parse(schema_data.fetch('schema'))
      [schema, schema_id]
    end

    # Fetch the schema from registry with the provided schema_id.
    def fetch_schema_by_id(schema_id)
      schema = @schemas_by_id.fetch(schema_id) do
        schema_json = @registry.fetch(schema_id)
        Avro::Schema.parse(schema_json)
      end
      [schema, schema_id]
    end

The text was updated successfully, but these errors were encountered:

dasch · 2024-02-19T12:10:57Z

Ah, I think the bug is caused by the incorrect assumption that fetch actually sets the value to the result of the block; it does not.

Can you do a PR?

I think the solution is to do this instead:

      schema = @schemas_by_id[schema_id] ||= begin
        schema_json = @registry.fetch(schema_id)
        Avro::Schema.parse(schema_json)
      end

This will actually assign the computed value to the hash while retaining the laziness.

Diyaa1 · 2024-02-20T08:40:24Z

Thanks @dasch, I moved with patching avro_turf using below in production.

I can do a PR, are below changes acceptable ? I'll do it when I get free time.

class AvroTurf
  class Messaging
    def fetch_schema(subject:, version: 'latest')
      schema_data = @registry.subject_version(subject, version)
      schema_id = schema_data.fetch('id')
      schema = @schemas_by_id[schema_id] ||= begin
        Avro::Schema.parse(schema_data.fetch('schema'))
      end
      [schema, schema_id]
    end

    def fetch_schema_by_id(schema_id)
      schema = @schemas_by_id[schema_id] ||= begin
        schema_json = @registry.fetch(schema_id)
        Avro::Schema.parse(schema_json)
      end
      [schema, schema_id]
    end
  end
end

dasch · 2024-02-26T14:47:58Z

Yeah, that looks good 👍

Diyaa1 changed the title ~~Any way to improve encoding performance for messages batches when using AvroTurf::Messaging~~ a way to improve encoding performance for messages batches when using AvroTurf::Messaging Feb 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a way to improve encoding performance for messages batches when using AvroTurf::Messaging #206

a way to improve encoding performance for messages batches when using AvroTurf::Messaging #206

Diyaa1 commented Feb 18, 2024 •

edited

Loading

dasch commented Feb 19, 2024

Diyaa1 commented Feb 20, 2024

dasch commented Feb 26, 2024

a way to improve encoding performance for messages batches when using AvroTurf::Messaging #206

a way to improve encoding performance for messages batches when using AvroTurf::Messaging #206

Comments

Diyaa1 commented Feb 18, 2024 • edited Loading

dasch commented Feb 19, 2024

Diyaa1 commented Feb 20, 2024

dasch commented Feb 26, 2024

Diyaa1 commented Feb 18, 2024 •

edited

Loading