Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event Hub output plugin does not reconnect after link is closed #15941

Open
jkominiakgps opened this issue Sep 26, 2024 · 2 comments
Open

Event Hub output plugin does not reconnect after link is closed #15941

jkominiakgps opened this issue Sep 26, 2024 · 2 comments
Labels
bug unexpected problem or unintended behavior

Comments

@jkominiakgps
Copy link

Relevant telegraf.conf

[agent]
  interval = "5s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "5s"
  flush_jitter = "0s"
  precision = ""
[[inputs.tail]]
  files = ["log.[0-9]*"]
  max_undelivered_lines = 1000
  data_format = "grok"
  grok_patterns = [
    '^%{TIMESTAMP_ISO8601:timestamp} \[%{LOGLEVEL:logLevel}\] (?:(?m)%{GREEDYDATA:msg})$'
  ]
[inputs.tail.multiline]
  pattern = '^\s'
  match_which_line = "previous"
  invert_match = false
  timeout = "1s"
[[ outputs.event_hubs ]]
  connection_string = "CONN_STRING"
  timeout = "30s"
  max_message_size = 1000000
  data_format = "json"

Logs from Telegraf

Starting Telegraf...
time="2024-08-27T13:37:52Z" level=warning msg="DBUS_SESSION_BUS_ADDRESS envvar looks to be not set, this can lead to runaway dbus-daemon processes. To avoid this, set envvar DBUS_SESSION_BUS_ADDRESS=$XDG_RUNTIME_DIR/bus (if it exists) or DBUS_SESSION_BUS_ADDRESS=/dev/null." func="gosnowflake.(*defaultLogger).Warn" file="log.go:244"
2024-08-27T13:37:52Z I! Loading config: /etc/telegraf/telegraf.conf
2024-08-27T13:37:52Z I! Starting Telegraf 1.31.3 brought to you by InfluxData the makers of Influx
DB
2024-08-27T13:37:52Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 6 secret-stores
2024-08-27T13:37:52Z I! Loaded inputs: tail
2024-08-27T13:37:52Z I! Loaded aggregators:
2024-08-27T13:37:52Z I! Loaded processors:
2024-08-27T13:37:52Z I! Loaded secretstores:
2024-08-27T13:37:52Z I! Loaded outputs: event_hubs
Started Telegraf.
2024-08-27T13:37:52Z I! [agent] Config: Interval:5s, Quiet:false, Flush Interval:5s
2024-08-27T13:37:52Z D! [agent] Initializing plugins
2024-08-27T13:37:52Z D! [agent] Connecting outputs
2024-08-27T13:37:52Z D! [agent] Attempting connection to [outputs.event_hubs]
2024-08-27T13:37:52Z D! [agent] Successfully connected to outputs.event_hubs
2024-08-27T13:37:52Z D! [agent] Starting service inputs
2024-08-27T13:37:52Z D! [inputs.tail]  Tail added for "/var/log/log.1"
2024-08-27T13:37:53Z D! [inputs.tail]  Tail added for "/var/log/log.2"
2024-08-27T13:37:53Z D! [inputs.tail]  Tail added for "/var/log/log.3"
2024-08-27T13:37:58Z D! [outputs.event_hubs]  Buffer fullness: 0 / 10000 metrics
2024-08-27T13:38:03Z D! [outputs.event_hubs]  Buffer fullness: 0 / 10000 metrics
2024-08-27T13:38:08Z D! [outputs.event_hubs]  Buffer fullness: 0 / 10000 metrics
2024-08-27T13:38:13Z D! [outputs.event_hubs]  Buffer fullness: 0 / 10000 metrics
2024-08-27T13:38:18Z D! [outputs.event_hubs]  Buffer fullness: 0 / 10000 metrics
2024-08-27T13:38:23Z D! [outputs.event_hubs]  Buffer fullness: 0 / 10000 metrics
2024-08-27T13:38:28Z D! [outputs.event_hubs]  Buffer fullness: 0 / 10000 metrics
2024-08-27T13:38:33Z D! [outputs.event_hubs]  Buffer fullness: 0 / 10000 metrics
2024-08-27T13:38:38Z D! [outputs.event_hubs]  Buffer fullness: 0 / 10000 metrics
2024-08-27T13:38:43Z D! [outputs.event_hubs]  Buffer fullness: 0 / 10000 metrics
2024-08-27T13:38:48Z D! [outputs.event_hubs]  Wrote batch of 1 metrics in 63.46866ms
2024-08-27T13:38:48Z D! [outputs.event_hubs]  Buffer fullness: 0 / 10000 metrics
2024-08-27T13:38:53Z D! [outputs.event_hubs]  Wrote batch of 1 metrics in 49.55651ms
........
2024-09-19T01:15:05Z D! [outputs.event_hubs]  Buffer fullness: 0 / 10000 metrics
2024-09-19T01:15:10Z D! [outputs.event_hubs]  Buffer fullness: 0 / 10000 metrics
2024-09-19T01:15:15Z D! [outputs.event_hubs]  Buffer fullness: 0 / 10000 metrics
2024-09-19T01:15:20Z D! [outputs.event_hubs]  Buffer fullness: 0 / 10000 metrics
2024-09-19T01:15:25Z D! [outputs.event_hubs]  Buffer fullness: 2 / 10000 metrics
2024-09-19T01:15:25Z E! [agent] Error writing to outputs.event_hubs: amqp: link closed
2024-09-19T01:15:30Z D! [outputs.event_hubs]  Buffer fullness: 2 / 10000 metrics
2024-09-19T01:15:30Z E! [agent] Error writing to outputs.event_hubs: amqp: link closed
2024-09-19T01:15:35Z D! [outputs.event_hubs]  Buffer fullness: 2 / 10000 metrics
2024-09-19T01:15:35Z E! [agent] Error writing to outputs.event_hubs: amqp: link closed
2024-09-19T01:15:40Z D! [outputs.event_hubs]  Buffer fullness: 2 / 10000 metrics
2024-09-19T01:15:40Z E! [agent] Error writing to outputs.event_hubs: amqp: link closed
2024-09-19T01:15:45Z D! [outputs.event_hubs]  Buffer fullness: 2 / 10000 metrics
2024-09-19T01:15:45Z E! [agent] Error writing to outputs.event_hubs: amqp: link closed
2024-09-19T01:15:50Z D! [outputs.event_hubs]  Buffer fullness: 2 / 10000 metrics
2024-09-19T01:15:50Z E! [agent] Error writing to outputs.event_hubs: amqp: link closed
2024-09-19T01:15:55Z D! [outputs.event_hubs]  Buffer fullness: 2 / 10000 metrics
2024-09-19T01:15:55Z E! [agent] Error writing to outputs.event_hubs: amqp: link closed
2024-09-19T01:16:00Z D! [outputs.event_hubs]  Buffer fullness: 2 / 10000 metrics
2024-09-19T01:16:00Z E! [agent] Error writing to outputs.event_hubs: amqp: link closed
2024-09-19T01:16:05Z D! [outputs.event_hubs]  Buffer fullness: 2 / 10000 metrics
2024-09-19T01:16:05Z E! [agent] Error writing to outputs.event_hubs: amqp: link closed
2024-09-19T01:16:10Z D! [outputs.event_hubs]  Buffer fullness: 2 / 10000 metrics
2024-09-19T01:16:10Z E! [agent] Error writing to outputs.event_hubs: amqp: link closed
2024-09-19T01:16:15Z D! [outputs.event_hubs]  Buffer fullness: 2 / 10000 metrics
2024-09-19T01:16:15Z E! [agent] Error writing to outputs.event_hubs: amqp: link closed
2024-09-19T01:16:20Z D! [outputs.event_hubs]  Buffer fullness: 2 / 10000 metrics
2024-09-19T01:16:20Z E! [agent] Error writing to outputs.event_hubs: amqp: link closed
2024-09-19T01:16:25Z D! [outputs.event_hubs]  Buffer fullness: 4 / 10000 metrics
2024-09-19T01:16:25Z E! [agent] Error writing to outputs.event_hubs: amqp: link closed
2024-09-19T01:16:30Z D! [outputs.event_hubs]  Buffer fullness: 4 / 10000 metrics

System info

Telegraf 1.31.3, Ubuntu 22.04.4 LTS

Docker

No response

Steps to reproduce

  1. Start telegraf with event hub output
  2. Run for varying amount of time (sometimes it'll fail within days, other times it is weeks)
  3. AMQP link is closed and fails to connect again until telegraf is manually restarted

Expected behavior

The connection should be reestablished without manual intervention

Actual behavior

Telegraf reports the link is closed and data is no longer published

Additional info

Telegraf 1.25 is what I've rolled back to as this version doesn't encounter the issue. I've tried with the latest version, 1.32, and the issue still persists.

@jkominiakgps jkominiakgps added the bug unexpected problem or unintended behavior label Sep 26, 2024
@srebhan
Copy link
Member

srebhan commented Oct 8, 2024

Could you help me to narrow down the issue? What exactly is the latest version telegraf is working correctly? Is it v1.25.3? And what is the first version the issue appears? Is it v1.26.0?

Is there e.g. a docker setup to reproduce the error? Like start docker container xyz, then connect telegraf and pause/resume the docker container?

@jkominiakgps
Copy link
Author

I was running v1.25.0 and initially upgraded to v1.27.1 which is when I initially noticed the issue. I have tried various versions since v1.27.X and I still see the issue when using v1.32.1. I am simply running an Azure VM with Ubuntu 22.04.4 LTS and there is no action taken on my end. Telegraf is configured based on the above and started. Initially everything is published to the event hub as expected and eventually, after some random amount of runtime, the link to the event hub is closed and does not appear to be reconnected. Telegraf simply logs the link is closed and nothing is published to event hub. If I manually restart the telegraf service when this occurs, it reconnects and begins publishing to the event hub again. I am currently trying v1.25.3 to see if the issue exists in the latest version of 1.25.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants