Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

engine: fix several file descriptor leaks. #8393

Merged
merged 10 commits into from
Jan 22, 2024

Conversation

pwhelan
Copy link
Contributor

@pwhelan pwhelan commented Jan 18, 2024

Summary

This is a reworking of the past PR #8371 which includes all these fixes in a single commit. This PR has been with commits spanning several files but only when it is the same logical component, ie: in a signle commit I did all the changes to allow freeing the timer used by the scheduler.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@@ -1135,6 +1135,12 @@ int flb_engine_shutdown(struct flb_config *config)
flb_hs_destroy(config->http_ctx);
}
#endif
if (config->evl) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to me that before restart the event loop is not being destroyed, I would suggest to identify the root cause for this

Copy link
Contributor Author

@pwhelan pwhelan Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into this. This specific piece of code was added in to mitigate crashes in the following tests:

  • flb-rt-config_map_opts
  • flb-rt-custom_calyptia_test
  • flb-rt-filter_stdout
    • case_insensitive_name

The other flb-rt-filter_stdout test, json_multiple, does not crash in the same manner as the other test.

This is most likely due to the manner in which these tests use the fluent-bit API to instantiate the pipelines they use. The case_insensitive_name test executes flb_destroy without first calling flb_stop. Upon trying to call flb_stop in the test it crashes when attempting to call pthread_join inside flb_stop. This is most likely due to the fact that the test also does not call flb_start to initialize the thread.

The easiest way out of this would be to simply leave the check there. Another alternative would be to move the deletion of the event channel into flb_stop, where it might be better placed. This would of course go against the symmetry it has with the channel being created inside flb_engine.c in the flb_engine_start function. If we move the destruction of the channel to flb_stop we should also probably move its creation flb_start. At the moment I have no idea what the consequences of this would be. The most obvious consequence would be that it would be linked to ctx->event_channel inside the flb_ctx_t instead of to config->event_thread_init, which is linked to the configuration.

This could also be due to the fact that flb_config_init sets config->is_running to TRUE instead of where I would expect it, in flb_engine_start. If it is set there instead flb_engine_shutdown would not get called in flb_destroy.

Copy link
Contributor Author

@pwhelan pwhelan Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving config->is_running into flb_engine_started avoids the SIGSEGV caused by destroying the ch_self_events channel but causes the memory used by all the custom, input, output and filter plugins to be leaked when using in several tests. The deallocation of plugins could be moved to flb_config_exit or similar, but that seems to me to be a bit ouf of scope. If that is the approach we want to take I can open a new PR later with the code to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants