`probe_on_startup`: Add input config and interface similar to `startup_error_behavior` #16028

LandonTClipp · 2024-10-15T18:29:42Z

Use Case

Related to:

This option should follow a similar pattern to startup_error_behavior. An interface will be defined:

type Prober interface {
    Probe() error
}

If an input plugin defines this interface and probe_on_startup is set to true, telegraf will call Probe() after it calls Start(). If Probe() returns an error, telegraf will handle it according to startup_error_behavior.

Please let me know your thoughts on this implementation, and if there are any other considerations to be made. I am happy to send in a PR if this looks good to you.

Expected behavior

This is not implemented.

Actual behavior

This is not implemented.

Additional info

The RunningInput type will be modified here: https://github.com/influxdata/telegraf/blob/v1.32.1/models/running_input.go#L131

After calling .Start(), if Start() returns no error, it will optionally call Probe() as an additional check. If Start() returns an error, Probe() will never be called.

The interface that defines Probe() will be written here: https://github.com/influxdata/telegraf/blob/v1.32.1/input.go

The text was updated successfully, but these errors were encountered:

srebhan · 2024-10-16T11:44:16Z

Do we need an error to be returned or can this simply be a boolean? Furthermore, I would add the probe interface to the plugin.go file as this might also be useful for outputs.

For what I want, you should distill your text above into spec with a PR to clearly define the behavior and the interaction with startup_error_behavior!

Alternatively, we could extend the startup spec...

What do you think?

LandonTClipp · 2024-10-16T15:17:27Z

Do we need an error to be returned or can this simply be a boolean?

Personally, having an error returned would be useful for logging purposes to identify what went wrong.

Furthermore, I would add the probe interface to the plugin.go file as this might also be useful for outputs.

Absolutely 👍🏻

For what I want, you should distill your text above into spec with a PR to clearly define the behavior and the interaction with startup_error_behavior!

Sounds good, I can submit a PR and add a new spec. Regarding extending the startup-error-behavior spec, I think it will need to be updated to note that startup_error_behavior will interact with probe_on_startup, such that Probe() will be considered an additional startup step that is to be optionally run when probe_on_startup=True. Although, I think the details of the behavior should be left to a new spec. I'll get to it!

srebhan · 2024-10-16T16:40:06Z

Regarding startup-error-behavior. What I was thinking is that we might want to add a probe setting/value which probes the plugin (if available) and ignores it if the probe fails. This way we don't need yet another option with potentially redundant and/or misleading combinatorics with the startup_error_behavior...

Btw: Returning an error is OK, just don't expect any special handling, it's just a good on nil, bad on everything else...

LandonTClipp · 2024-10-16T17:59:35Z

Regarding startup-error-behavior. What I was thinking is that we might want to add a probe setting/value which probes the plugin (if available) and ignores it if the probe fails.

Ah, I see what you're saying. In that case, we would need to introduce multiple values:

probe-error: Probe after startup and if there is an error, fail out.
probe-ignore: Probe after startup and if there is an error, ignore.
probe-retry: Probe after startup and if there is an error, retry.

To me, this feels... messy? Basically, we'd need a probe- for every current value of startup_error_behavior, which doubles the number of values. It seems that both Probe() and Start() are two separate things that should both be treated identically according to startup_error_behavior. If probe_on_start=True then we simply do the exact same thing that we did with Start().

If you wanted, we could even go a step further and say probe_error_behavior that specifies separately how to handle Probe() errors, but I'm not sure that's really necessary?

srebhan · 2024-10-17T08:11:30Z

I don't think you need the combinatorics. In my view . The settings error, retry and ignore would only relate to startup and won't call Probe() at all. Now we only need to introduce a probe case that is identical to ignore but additionally calls Probe() and ignores the plugin if probe fails.

If you think about it, what would error and probe_on_start=true mean? It's kind of useless, the same for retry I think. Probing would only make sense in the ignore case, wouldn't it?

LandonTClipp · 2024-10-17T17:39:51Z

If you think about it, what would error and probe_on_start=true mean? It's kind of useless, the same for retry I think. Probing would only make sense in the ignore case, wouldn't it?

I see your point, however I can imagine that some users would want to retain retry and error behavior. Take for example the chrony input. The plugin will run Start(). All this method does is determine which URL and transport it should be using to communicate with the server, but it doesn't actually probe it (obviously). What if a user wanted to assert that the server can actually be communicated with? Well, they would want to Probe() it, and if an error is returned, perhaps they want telegraf to hard fail. In that case, they want error behavior. Or consider if they expect some number of failures, they would want to retry a few times.

To say that a Probe() failure should result in Telegraf ignoring the plugin is kind of making a big assumption about what the user wants. For my personal use-case, I do want it to ignore, but I can't confidently say all users would want the same behavior. Thus, that's why I think the logic that implements startup_error_behavior should be run twice, once on Start() and once on Probe() (if enabled).

srebhan · 2024-10-18T11:13:34Z

Maybe. And that's my point. Extending the startup behavior option makes it easy to extend the constraints later on. We have been bitten by combinatorics of parameters where we the had to evaluate all those weird combinations that do not make sense or cause ambiguities.

To say that a Probe() failure should result in Telegraf ignoring the plugin is kind of making a big assumption about what the user wants. For my personal use-case, I do want it to ignore, but I can't confidently say all users would want the same behavior.

Yes but you also can't confidentially say other people want some other behavior. For me probing only makes sense in environments where I cannot be sure certain hardware or services are available. There might be use-cases for this but then I want a feature-request describing what the behavior should be and someone to actually test this in a real-world scenario instead of adding complexity upfront for cases nobody cares about.

LandonTClipp · 2024-10-18T16:36:06Z

That's very fair. I'll have to defer to you on the direction to take since this is your project and not mine! I'm okay with the path of adding a single probe option to startup_error_behavior if that's what you prefer. It will at least satisfy my requirements. Thanks for bottoming out the conversation, I'll get a spec submitted.

LandonTClipp added the feature request Requests for new plugin and for new features to existing plugins label Oct 15, 2024

LandonTClipp changed the title ~~feat(probe_on_startup): Add input config and interface similar to startup_error_behavior~~ probe_on_startup: Add input config and interface similar to startup_error_behavior Oct 15, 2024

LandonTClipp linked a pull request Oct 21, 2024 that will close this issue

docs: Add probe as value to startup_error_behavior #16052

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`probe_on_startup`: Add input config and interface similar to `startup_error_behavior` #16028

`probe_on_startup`: Add input config and interface similar to `startup_error_behavior` #16028

LandonTClipp commented Oct 15, 2024

srebhan commented Oct 16, 2024 •

edited

Loading

LandonTClipp commented Oct 16, 2024

srebhan commented Oct 16, 2024 •

edited

Loading

LandonTClipp commented Oct 16, 2024

srebhan commented Oct 17, 2024

LandonTClipp commented Oct 17, 2024

srebhan commented Oct 18, 2024 •

edited

Loading

LandonTClipp commented Oct 18, 2024

probe_on_startup: Add input config and interface similar to startup_error_behavior #16028

probe_on_startup: Add input config and interface similar to startup_error_behavior #16028

Comments

LandonTClipp commented Oct 15, 2024

Use Case

Expected behavior

Actual behavior

Additional info

srebhan commented Oct 16, 2024 • edited Loading

LandonTClipp commented Oct 16, 2024

srebhan commented Oct 16, 2024 • edited Loading

LandonTClipp commented Oct 16, 2024

srebhan commented Oct 17, 2024

LandonTClipp commented Oct 17, 2024

srebhan commented Oct 18, 2024 • edited Loading

LandonTClipp commented Oct 18, 2024

`probe_on_startup`: Add input config and interface similar to `startup_error_behavior` #16028

`probe_on_startup`: Add input config and interface similar to `startup_error_behavior` #16028

srebhan commented Oct 16, 2024 •

edited

Loading

srebhan commented Oct 16, 2024 •

edited

Loading

srebhan commented Oct 18, 2024 •

edited

Loading