Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

probe_on_startup: Add input config and interface similar to startup_error_behavior #16028

Open
LandonTClipp opened this issue Oct 15, 2024 · 8 comments · May be fixed by #16052
Open

probe_on_startup: Add input config and interface similar to startup_error_behavior #16028

LandonTClipp opened this issue Oct 15, 2024 · 8 comments · May be fixed by #16052
Labels
feature request Requests for new plugin and for new features to existing plugins

Comments

@LandonTClipp
Copy link
Contributor

Use Case

Related to:

This option should follow a similar pattern to startup_error_behavior. An interface will be defined:

type Prober interface {
    Probe() error
}

If an input plugin defines this interface and probe_on_startup is set to true, telegraf will call Probe() after it calls Start(). If Probe() returns an error, telegraf will handle it according to startup_error_behavior.

Please let me know your thoughts on this implementation, and if there are any other considerations to be made. I am happy to send in a PR if this looks good to you.

Expected behavior

This is not implemented.

Actual behavior

This is not implemented.

Additional info

The RunningInput type will be modified here: https://github.com/influxdata/telegraf/blob/v1.32.1/models/running_input.go#L131

After calling .Start(), if Start() returns no error, it will optionally call Probe() as an additional check. If Start() returns an error, Probe() will never be called.

The interface that defines Probe() will be written here: https://github.com/influxdata/telegraf/blob/v1.32.1/input.go

@LandonTClipp LandonTClipp added the feature request Requests for new plugin and for new features to existing plugins label Oct 15, 2024
@LandonTClipp LandonTClipp changed the title feat(probe_on_startup): Add input config and interface similar to startup_error_behavior probe_on_startup: Add input config and interface similar to startup_error_behavior Oct 15, 2024
@srebhan
Copy link
Member

srebhan commented Oct 16, 2024

Do we need an error to be returned or can this simply be a boolean? Furthermore, I would add the probe interface to the plugin.go file as this might also be useful for outputs.

For what I want, you should distill your text above into spec with a PR to clearly define the behavior and the interaction with startup_error_behavior!

Alternatively, we could extend the startup spec...

What do you think?

@LandonTClipp
Copy link
Contributor Author

Do we need an error to be returned or can this simply be a boolean?

Personally, having an error returned would be useful for logging purposes to identify what went wrong.

Furthermore, I would add the probe interface to the plugin.go file as this might also be useful for outputs.

Absolutely 👍🏻

For what I want, you should distill your text above into spec with a PR to clearly define the behavior and the interaction with startup_error_behavior!

Sounds good, I can submit a PR and add a new spec. Regarding extending the startup-error-behavior spec, I think it will need to be updated to note that startup_error_behavior will interact with probe_on_startup, such that Probe() will be considered an additional startup step that is to be optionally run when probe_on_startup=True. Although, I think the details of the behavior should be left to a new spec. I'll get to it!

@srebhan
Copy link
Member

srebhan commented Oct 16, 2024

Regarding startup-error-behavior. What I was thinking is that we might want to add a probe setting/value which probes the plugin (if available) and ignores it if the probe fails. This way we don't need yet another option with potentially redundant and/or misleading combinatorics with the startup_error_behavior...

Btw: Returning an error is OK, just don't expect any special handling, it's just a good on nil, bad on everything else...

@LandonTClipp
Copy link
Contributor Author

Regarding startup-error-behavior. What I was thinking is that we might want to add a probe setting/value which probes the plugin (if available) and ignores it if the probe fails.

Ah, I see what you're saying. In that case, we would need to introduce multiple values:

  • probe-error: Probe after startup and if there is an error, fail out.
  • probe-ignore: Probe after startup and if there is an error, ignore.
  • probe-retry: Probe after startup and if there is an error, retry.

To me, this feels... messy? Basically, we'd need a probe- for every current value of startup_error_behavior, which doubles the number of values. It seems that both Probe() and Start() are two separate things that should both be treated identically according to startup_error_behavior. If probe_on_start=True then we simply do the exact same thing that we did with Start().

If you wanted, we could even go a step further and say probe_error_behavior that specifies separately how to handle Probe() errors, but I'm not sure that's really necessary?

@srebhan
Copy link
Member

srebhan commented Oct 17, 2024

I don't think you need the combinatorics. In my view . The settings error, retry and ignore would only relate to startup and won't call Probe() at all. Now we only need to introduce a probe case that is identical to ignore but additionally calls Probe() and ignores the plugin if probe fails.

If you think about it, what would error and probe_on_start=true mean? It's kind of useless, the same for retry I think. Probing would only make sense in the ignore case, wouldn't it?

@LandonTClipp
Copy link
Contributor Author

If you think about it, what would error and probe_on_start=true mean? It's kind of useless, the same for retry I think. Probing would only make sense in the ignore case, wouldn't it?

I see your point, however I can imagine that some users would want to retain retry and error behavior. Take for example the chrony input. The plugin will run Start(). All this method does is determine which URL and transport it should be using to communicate with the server, but it doesn't actually probe it (obviously). What if a user wanted to assert that the server can actually be communicated with? Well, they would want to Probe() it, and if an error is returned, perhaps they want telegraf to hard fail. In that case, they want error behavior. Or consider if they expect some number of failures, they would want to retry a few times.

To say that a Probe() failure should result in Telegraf ignoring the plugin is kind of making a big assumption about what the user wants. For my personal use-case, I do want it to ignore, but I can't confidently say all users would want the same behavior. Thus, that's why I think the logic that implements startup_error_behavior should be run twice, once on Start() and once on Probe() (if enabled).

@srebhan
Copy link
Member

srebhan commented Oct 18, 2024

Maybe. And that's my point. Extending the startup behavior option makes it easy to extend the constraints later on. We have been bitten by combinatorics of parameters where we the had to evaluate all those weird combinations that do not make sense or cause ambiguities.

To say that a Probe() failure should result in Telegraf ignoring the plugin is kind of making a big assumption about what the user wants. For my personal use-case, I do want it to ignore, but I can't confidently say all users would want the same behavior.

Yes but you also can't confidentially say other people want some other behavior. For me probing only makes sense in environments where I cannot be sure certain hardware or services are available. There might be use-cases for this but then I want a feature-request describing what the behavior should be and someone to actually test this in a real-world scenario instead of adding complexity upfront for cases nobody cares about.

@LandonTClipp
Copy link
Contributor Author

That's very fair. I'll have to defer to you on the direction to take since this is your project and not mine! I'm okay with the path of adding a single probe option to startup_error_behavior if that's what you prefer. It will at least satisfy my requirements. Thanks for bottoming out the conversation, I'll get a spec submitted.

@LandonTClipp LandonTClipp linked a pull request Oct 21, 2024 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requests for new plugin and for new features to existing plugins
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants