Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/filelog] fix record counting with header #35870

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

andrzej-stencel
Copy link
Member

Description

Fixes #35869 by refactoring of the Reader::ReadToEnd method.

This refactors the Reader::ReadToEnd method by separating reading the file's header from reading the file's contents.
This results in very similar code in readHeader and readContents methods, which was previously deduplicated at the cost of slightly higher complexity.
The bug could be fixed without separating header reading from contents reading, but I hope this separation will make it easier to implement content batching in the Reader (#35455).
Content batching was my original motivation for these code changes.
I only discovered the problem with record counting when reading the code.

Link to tracking issue

Fixes #35869

Testing

In the first commit I have added tests that document the erroneous behavior. In the second commit I have fixed the bug and corrected the tests.

Documentation

Added changelog entry.

This refactors the `Reader::ReadToEnd` method
by separating reading the file's header from reading the file's contents.
This results in very similar code in `readHeader` and `readContents` methods,
which was previosly deduplicated at the cost of slightly higher complexity.
The bug could be fixed without separating header reading from contents reading,
but I hope this separation will make it easier to implement content batching
in the Reader (open-telemetry#35455).
Content batching was my original motivation for these code changes.
I only discovered the problem with record counting when reading the code.

func (r *Reader) readContents(ctx context.Context) {
// Create the scanner to read the contents of the file.
s := scanner.New(r, r.maxLogSize, r.initialBufferSize, r.Offset, r.splitFunc)
Copy link
Member Author

@andrzej-stencel andrzej-stencel Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I recreate the scanner again with r.initialBufferSize, but its value is updated to scanner.DefaultBufferSize in "header machinery cleanup" section of the readHeader method.I hope this reflects the original behavior.

As far as I understand the "initial buffer size" is only used in tests and I'm not sure why those tests are actually needed.


// Switch to the normal split and process functions.
r.splitFunc = r.lineSplitFunc
r.processFunc = r.emitFunc
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "juggling" of function references could probably be avoided altogether after the refactoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[receiver/filelog] record numbers are incorrect when header is configured
2 participants