Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fluentd-elasticsearch] Multi Process Workers #56

Open
kfirfer opened this issue Jan 18, 2021 · 1 comment
Open

[fluentd-elasticsearch] Multi Process Workers #56

kfirfer opened this issue Jan 18, 2021 · 1 comment
Labels
enhancement New feature or request

Comments

@kfirfer
Copy link

kfirfer commented Jan 18, 2021

What do you think about Multi Process Workers in fluentd ?
https://docs.fluentd.org/deployment/multi-process-workers

@kfirfer kfirfer added the enhancement New feature or request label Jan 18, 2021
@nvtkaszpir
Copy link
Contributor

nvtkaszpir commented Feb 12, 2021

this is super problematic in general:

  • each worker spawns separate jobs and thus separate workdkir (buffers)
  • changing workers from 1 to more tends to generate subdirectories per worker (and thus worker=1 is not the same pattern as wotker=2 or more wokrers)
  • above means that if you change workers from 1 to more may cause data loss from old worker
  • scaling down workers from X to X-1 (where X>2) may cause data loss due to the fact that the buffers left from other workers may be never processed
  • scaling down workers from X (wher X>=2) to 1 may cause data loss because directory structucre is again changed (as in second point from the top)

What it means:

  • if you have a worker=1 now (default) and if you want to increase multi-worker then it is safer to create new deployment/statefulset with mutliple workers
  • scaling up workers is pretty safe (say from 2 to 4)
  • scaling down workers may lead to data loss - if you have workers=4 and want to switch to workers=3, then you may have orphaned files in hte buffer left by the last worker, and you need to handle it on our own.
  • above means - don't change workers or spawn new deployment with new worker count and deregister old deployment from the loadbalancers/services so that they gradually drain the buffers to avoid data loss, after that you can remoe them

So if you want to use multiple workers, you can do it, but stick to it really hard from the start and remember about limitations. It' way easier to spawn new pods in geneeeral instead of spawning more workers in pod. Yet you may need to tune worker threads per node size, so it's worth to count nproc or something like that when starting daemonset on the host.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

2 participants