Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter_multiline: Reset group metadata buf on flush - Fixes 9262,8925,9337,8567 #9425

Merged
merged 1 commit into from
Sep 26, 2024

Conversation

drbugfinder-work
Copy link
Contributor

@drbugfinder-work drbugfinder-work commented Sep 26, 2024

This PR resets the size of the metadata buffer on a flush event, which fixes continuous growth of this buffer over time.

Fixes #9262, Fixes #8925, Fixes #9337, Fixes #8567

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • [N/A] Example configuration file for the change
  • [N/A] Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found
root@fb-dev:~/test/fluent-bit/build# valgrind --leak-check=full ./bin/fluent-bit -c /root/test/fluent-bit.conf 
==645974== Memcheck, a memory error detector
==645974== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==645974== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==645974== Command: ./bin/fluent-bit -c /root/test/fluent-bit.conf
==645974== 
Fluent Bit v3.1.7
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  __  
|  ___| |                | |   | ___ (_) |         |____ |/  | 
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`| | 
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \ | | 
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /_| |_
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)___/

[2024/09/26 08:59:23] [ info] [fluent bit] version=3.1.7, commit=c6e902a43a, pid=645974
[2024/09/26 08:59:23] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/09/26 08:59:23] [ info] [cmetrics] version=0.9.5
[2024/09/26 08:59:23] [ info] [ctraces ] version=0.5.5
[2024/09/26 08:59:23] [ info] [input:tail:tail.0] initializing
[2024/09/26 08:59:23] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2024/09/26 08:59:23] [ info] [filter:multiline:multiline.0] created emitter: emitter_for_multiline.0
[2024/09/26 08:59:23] [ info] [input:emitter:emitter_for_multiline.0] initializing
[2024/09/26 08:59:23] [ info] [input:emitter:emitter_for_multiline.0] storage_strategy='memory' (memory only)
[2024/09/26 08:59:23] [ info] [sp] stream processor started
[2024/09/26 08:59:23] [ info] [output:file:file.0] worker #0 started
^C[2024/09/26 08:59:32] [engine] caught signal (SIGINT)
[2024/09/26 08:59:32] [ warn] [engine] service will shutdown in max 5 seconds
[2024/09/26 08:59:32] [ info] [input] pausing tail.0
[2024/09/26 08:59:32] [ info] [input] pausing emitter_for_multiline.0
[2024/09/26 08:59:32] [ info] [engine] service has stopped (0 pending tasks)
[2024/09/26 08:59:32] [ info] [input] pausing tail.0
[2024/09/26 08:59:32] [ info] [output:file:file.0] thread worker #0 stopping...
[2024/09/26 08:59:32] [ info] [input] pausing emitter_for_multiline.0
[2024/09/26 08:59:32] [ info] [output:file:file.0] thread worker #0 stopped
==645974== 
==645974== HEAP SUMMARY:
==645974==     in use at exit: 0 bytes in 0 blocks
==645974==   total heap usage: 2,437 allocs, 2,437 frees, 765,810 bytes allocated
==645974== 
==645974== All heap blocks were freed -- no leaks are possible
==645974== 
==645974== For lists of detected and suppressed errors, rerun with: -s
==645974== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
root@fb-dev:~/test/fluent-bit/build# 

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Previous observations:

time spent in                flb_ml_flush_metadata_buffer: 0.001546 seconds. Current unix epoch 1727339480.353680724
number of loop iterations in flb_ml_flush_metadata_buffer: 9384
time spent in                flb_ml_flush_metadata_buffer: 0.001582 seconds. Current unix epoch 1727339480.355315022
number of loop iterations in flb_ml_flush_metadata_buffer: 9390
time spent in                flb_ml_flush_metadata_buffer: 0.001532 seconds. Current unix epoch 1727339480.356888513
...
time spent in                flb_ml_flush_metadata_buffer: 0.012588 seconds. Current unix epoch 1727339562.777399754
number of loop iterations in flb_ml_flush_metadata_buffer: 78762
time spent in                flb_ml_flush_metadata_buffer: 0.012442 seconds. Current unix epoch 1727339562.789904977
number of loop iterations in flb_ml_flush_metadata_buffer: 78768
time spent in                flb_ml_flush_metadata_buffer: 0.012519 seconds. Current unix epoch 1727339562.802487816
...
time spent in                flb_ml_flush_metadata_buffer: 0.026053 seconds. Current unix epoch 1727339877.411707060
number of loop iterations in flb_ml_flush_metadata_buffer: 174182
time spent in                flb_ml_flush_metadata_buffer: 0.026309 seconds. Current unix epoch 1727339877.438083181
number of loop iterations in flb_ml_flush_metadata_buffer: 174188
time spent in                flb_ml_flush_metadata_buffer: 0.027012 seconds. Current unix epoch 1727339877.465138702
...
number of loop iterations in flb_ml_flush_metadata_buffer: 255804
time spent in                flb_ml_flush_metadata_buffer: 0.041282 seconds. Current unix epoch 1727340334.109516327
number of loop iterations in flb_ml_flush_metadata_buffer: 255810
time spent in                flb_ml_flush_metadata_buffer: 0.040342 seconds. Current unix epoch 1727340334.149876032
number of loop iterations in flb_ml_flush_metadata_buffer: 255816
time spent in                flb_ml_flush_metadata_buffer: 0.041243 seconds. Current unix epoch 1727340334.191167722

I measured the time spent in the flb_ml_flush_metadata_buffer method and the number of iterations in the while loop

while (ret == FLB_EVENT_ENCODER_SUCCESS &&
msgpack_unpack_next(&metadata_map,
group->mp_md_sbuf.data,
group->mp_md_sbuf.size,
&offset) == MSGPACK_UNPACK_SUCCESS) {

As you can see the number of iterations is constantly growing and the time in the method also rises constantly, which fully explains the rising CPU load over time for me.

Can you please review @leonardo-albertovich @edsiper

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

This commit resets the size of the metadata buffer on a flush.
It fixes continuous growth of this buffer over time.

Signed-off-by: Richard Treu <[email protected]>
@drbugfinder-work drbugfinder-work changed the title filter_multiline: Reset group metadata buffer on flush filter_multiline: Reset group metadata buffer on flush; Fixes #9262, #8925, #9337, #8567 Sep 26, 2024
@drbugfinder-work drbugfinder-work changed the title filter_multiline: Reset group metadata buffer on flush; Fixes #9262, #8925, #9337, #8567 filter_multiline: Reset group metadata buffer on flush; Fixes issues 9262, 8925, 9337, 8567 Sep 26, 2024
@drbugfinder-work drbugfinder-work changed the title filter_multiline: Reset group metadata buffer on flush; Fixes issues 9262, 8925, 9337, 8567 filter_multiline: Reset group metadata buf on flush; Fixes 9262,8925,9337,8567 Sep 26, 2024
@drbugfinder-work drbugfinder-work changed the title filter_multiline: Reset group metadata buf on flush; Fixes 9262,8925,9337,8567 filter_multiline: Reset group metadata buf on flush - Fixes 9262,8925,9337,8567 Sep 26, 2024
@edsiper
Copy link
Member

edsiper commented Sep 26, 2024

I read the thread info #9262 (comment), pretty good job :)

Since it has been tested extensively I am moving forward to merge and do the backport before tomorrow's release

@edsiper edsiper merged commit dd53a39 into fluent:master Sep 26, 2024
57 of 60 checks passed
@edsiper
Copy link
Member

edsiper commented Sep 26, 2024

@drbugfinder-work FYI: I cherry picked this fix into PR for 3.1 here: #9433

@drbugfinder-work drbugfinder-work deleted the fix-ml-md-flush branch September 27, 2024 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants