Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a way to improve encoding performance for messages batches when using AvroTurf::Messaging #206

Open
Diyaa1 opened this issue Feb 18, 2024 · 3 comments

Comments

@Diyaa1
Copy link

Diyaa1 commented Feb 18, 2024

Hello there,

We have this case where we are encoding over 1000 messages at a time, but I noticed a bottleneck where AvroTurf takes over 60% of the CPU time, so I profiled the code and found out that the schema is being parsed for every message as noticed in below code, is this intentional? is there anyway to improve it ?

note: even in fetch_schema_by_id the method @schemas_by_id is not being set anywhere.

appreciate any help thank you :)

    # Providing subject and version to determine the schema,
    # which skips the auto registeration of schema on the schema registry.
    # Fetch the schema from registry with the provided subject name and version.
    def fetch_schema(subject:, version: 'latest')
      schema_data = @registry.subject_version(subject, version)
      schema_id = schema_data.fetch('id')
      schema = Avro::Schema.parse(schema_data.fetch('schema'))
      [schema, schema_id]
    end
    # Fetch the schema from registry with the provided schema_id.
    def fetch_schema_by_id(schema_id)
      schema = @schemas_by_id.fetch(schema_id) do
        schema_json = @registry.fetch(schema_id)
        Avro::Schema.parse(schema_json)
      end
      [schema, schema_id]
    end
@Diyaa1 Diyaa1 changed the title Any way to improve encoding performance for messages batches when using AvroTurf::Messaging a way to improve encoding performance for messages batches when using AvroTurf::Messaging Feb 18, 2024
@dasch
Copy link
Owner

dasch commented Feb 19, 2024

Ah, I think the bug is caused by the incorrect assumption that fetch actually sets the value to the result of the block; it does not.

Can you do a PR?

I think the solution is to do this instead:

      schema = @schemas_by_id[schema_id] ||= begin
        schema_json = @registry.fetch(schema_id)
        Avro::Schema.parse(schema_json)
      end

This will actually assign the computed value to the hash while retaining the laziness.

@Diyaa1
Copy link
Author

Diyaa1 commented Feb 20, 2024

Thanks @dasch, I moved with patching avro_turf using below in production.

I can do a PR, are below changes acceptable ? I'll do it when I get free time.

class AvroTurf
  class Messaging
    def fetch_schema(subject:, version: 'latest')
      schema_data = @registry.subject_version(subject, version)
      schema_id = schema_data.fetch('id')
      schema = @schemas_by_id[schema_id] ||= begin
        Avro::Schema.parse(schema_data.fetch('schema'))
      end
      [schema, schema_id]
    end

    def fetch_schema_by_id(schema_id)
      schema = @schemas_by_id[schema_id] ||= begin
        schema_json = @registry.fetch(schema_id)
        Avro::Schema.parse(schema_json)
      end
      [schema, schema_id]
    end
  end
end

@dasch
Copy link
Owner

dasch commented Feb 26, 2024

Yeah, that looks good 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants