Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k/record_batcher: Move to kafka/data #23832

Draft
wants to merge 7 commits into
base: dev
Choose a base branch
from

Commits on Oct 17, 2024

  1. record_batcher: Move from transform/logging to kafka/client

    Useful in audit_log_manager as well as transform logging
    
    Signed-off-by: Oren Leiman <[email protected]>
    oleiman committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    208717d View commit details
    Browse the repository at this point in the history
  2. k/record_batcher: Make k/v interfaces optional<iobuf>

    And adds make_batch_of_one
    
    Signed-off-by: Oren Leiman <[email protected]>
    oleiman committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    f7e30f2 View commit details
    Browse the repository at this point in the history
  3. k/record_batcher: Optionally inject a ss::logger

    Signed-off-by: Oren Leiman <[email protected]>
    oleiman committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    d6c52a7 View commit details
    Browse the repository at this point in the history
  4. audit: Plumb metadata_cache and node_id into audit_log_manager

    We need these for sorting out which partitions are locally led
    
    Signed-off-by: Oren Leiman <[email protected]>
    oleiman committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    957f49e View commit details
    Browse the repository at this point in the history
  5. audit: Perform record batching and partition assignment in manager

    Previous implementation used a very high value for retries on the
    internal kafka client, which prevents the client from recovering
    certain types of errors.
    
    Instead, we batch up drained records on the manager side, allowing
    us to hold a copy of each batch in memory and retry failed produce
    calls from "scratch".
    
    This also allows us to be _much_ more aggressive about batching.
    The internal kafka client will calculate a destination partition
    for each record, round robin style over the number of partitions.
    In the new scheme, we shoot for a maximally sized batch first, then
    select a destination, still round-robin style, but biasing heavily
    toward locally led partitions. In this way, given the default audit
    per-shard queue limit and default max batch size (both 1MiB), the
    most common drain operation should result in exactly one produce
    request.
    
    Signed-off-by: Oren Leiman <[email protected]>
    oleiman committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    c020e2e View commit details
    Browse the repository at this point in the history
  6. k/record_batcher: Move to kafka/data

    oleiman committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    3417b42 View commit details
    Browse the repository at this point in the history

Commits on Oct 18, 2024

  1. module.bazel.lock

    oleiman committed Oct 18, 2024
    Configuration menu
    Copy the full SHA
    60863b3 View commit details
    Browse the repository at this point in the history