Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

secrets: deadlock when unable to lock memory #13806

Closed
redbaron opened this issue Aug 21, 2023 · 1 comment · Fixed by #13998
Closed

secrets: deadlock when unable to lock memory #13806

redbaron opened this issue Aug 21, 2023 · 1 comment · Fixed by #13998
Labels
bug unexpected problem or unintended behavior

Comments

@redbaron
Copy link
Contributor

redbaron commented Aug 21, 2023

Relevant telegraf.conf

[[inputs.postgresql_extensible]]
    alias = "server1"
    address = "postgres://posgres:postgres@localhost:5432/postgres?default_transaction_read_only=on&application_name=telegraf-pgexport"

[[inputs.postgresql_extensible.query]]
measurement="pg_version"
sqlquery = """
  select current_database() as datname, 1 as gauge;
"""

....
repeat 100 times
...

Logs from Telegraf

2023-08-21T17:50:00Z W! [inputs.postgresql_extensible::server1] Collection took longer than expected; not complete after interval of 1m0s
2023-08-21T17:50:00Z W! [inputs.postgresql_extensible::server2] Collection took longer than expected; not complete after interval of 1m0s
2023-08-21T17:50:00Z W! [inputs.postgresql_extensible::server3] Collection took longer than expected; not complete after interval of 1m0s
...

System info

bbc6322

Docker

No response

Steps to reproduce

run on linux and limit lockable memory to small value: ulimit -l should report 16 . Use ulimit to lower maximum lockable memory if it is higher.

Also we run telegraf with the GOMAXPROCS=1 env var and single CPU allocated. Running it with cpuset should achieve similar setup

Expected behavior

In the order of preference:

  • if running out of lockable memory it should "queue" waiting for previous LockedBuffer to be dropped before acquiring new one
  • if running out of locakable memory it should return gather error for unlucky plugins
  • at the very least it should panic without deadlocking

Actual behavior

plugin gather goroutnes deadlock most with one of following (first number is number of goroutines with the same stacktrace)


84 @ 0x43e8ee 0x44fdd8 0x44fdaf 0x46dfc5 0x49171d 0xae3f0f 0xae3eee 0xae3e7e 0xae151c 0xae12dc 0xae3d87 0xae73da 0xafc5bc 0x324bfab 0x3263ca5 0x3263885 0x326349f 0xac0d34 0x5cfeb08 0x472141
#	0x46dfc4	sync.runtime_SemacquireMutex+0x24										/usr/local/go/src/runtime/sema.go:77
#	0x49171c	sync.(*Mutex).lockSlow+0x15c											/usr/local/go/src/sync/mutex.go:171
#	0xae3f0e	sync.(*Mutex).Lock+0x4e												/usr/local/go/src/sync/mutex.go:90
#	0xae3eed	github.com/awnumar/memguard/core.Purge.func1+0x2d								/go/pkg/mod/github.com/awnumar/[email protected]/core/exit.go:23
#	0xae3e7d	github.com/awnumar/memguard/core.Purge+0x1d									/go/pkg/mod/github.com/awnumar/[email protected]/core/exit.go:51
#	0xae151b	github.com/awnumar/memguard/core.Panic+0x51b									/go/pkg/mod/github.com/awnumar/[email protected]/core/exit.go:85
#	0xae12db	github.com/awnumar/memguard/core.NewBuffer+0x2db								/go/pkg/mod/github.com/awnumar/[email protected]/core/buffer.go:73
#	0xae3d86	github.com/awnumar/memguard/core.Open+0x26									/go/pkg/mod/github.com/awnumar/[email protected]/core/enclave.go:105
#	0xae73d9	github.com/awnumar/memguard.(*Enclave).Open+0x19								/go/pkg/mod/github.com/awnumar/[email protected]/enclave.go:43
#	0xafc5bb	github.com/influxdata/telegraf/config.(*Secret).Get+0x7b							/app/config/secret.go:137
#	0x324bfaa	github.com/influxdata/telegraf/plugins/inputs/postgresql.(*Service).SanitizedAddress+0x4a			/app/plugins/inputs/postgresql/service.go:160
#	0x3263ca4	github.com/influxdata/telegraf/plugins/inputs/postgresql_extensible.(*Postgresql).accRow+0x304			/app/plugins/inputs/postgresql_extensible/postgresql_extensible.go:217
#	0x3263884	github.com/influxdata/telegraf/plugins/inputs/postgresql_extensible.(*Postgresql).gatherMetricsFromQuery+0x384	/app/plugins/inputs/postgresql_extensible/postgresql_extensible.go:184
#	0x326349e	github.com/influxdata/telegraf/plugins/inputs/postgresql_extensible.(*Postgresql).Gather+0x2de			/app/plugins/inputs/postgresql_extensible/postgresql_extensible.go:152
#	0xac0d33	github.com/influxdata/telegraf/models.(*RunningInput).Gather+0x53						/app/models/running_input.go:149
#	0x5cfeb07	github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1+0x27						/app/agent/agent.go:575

12 @ 0x43e8ee 0x44fdd8 0x44fdaf 0x46dfc5 0x49171d 0xae2efd 0xae2ee4 0xae3da5 0xae73da 0xafc5bc 0x324bfab 0x3263ca5 0x3263885 0x326349f 0xac0d34 0x5cfeb08 0x472141
#	0x46dfc4	sync.runtime_SemacquireMutex+0x24										/usr/local/go/src/runtime/sema.go:77
#	0x49171c	sync.(*Mutex).lockSlow+0x15c											/usr/local/go/src/sync/mutex.go:171
#	0xae2efc	sync.(*Mutex).Lock+0x5c												/usr/local/go/src/sync/mutex.go:90
#	0xae2ee3	github.com/awnumar/memguard/core.(*Coffer).View+0x43								/go/pkg/mod/github.com/awnumar/[email protected]/core/coffer.go:79
#	0xae3da4	github.com/awnumar/memguard/core.Open+0x44									/go/pkg/mod/github.com/awnumar/[email protected]/core/enclave.go:111
#	0xae73d9	github.com/awnumar/memguard.(*Enclave).Open+0x19								/go/pkg/mod/github.com/awnumar/[email protected]/enclave.go:43
#	0xafc5bb	github.com/influxdata/telegraf/config.(*Secret).Get+0x7b							/app/config/secret.go:137
#	0x324bfaa	github.com/influxdata/telegraf/plugins/inputs/postgresql.(*Service).SanitizedAddress+0x4a			/app/plugins/inputs/postgresql/service.go:160
#	0x3263ca4	github.com/influxdata/telegraf/plugins/inputs/postgresql_extensible.(*Postgresql).accRow+0x304			/app/plugins/inputs/postgresql_extensible/postgresql_extensible.go:217
#	0x3263884	github.com/influxdata/telegraf/plugins/inputs/postgresql_extensible.(*Postgresql).gatherMetricsFromQuery+0x384	/app/plugins/inputs/postgresql_extensible/postgresql_extensible.go:184
#	0x326349e	github.com/influxdata/telegraf/plugins/inputs/postgresql_extensible.(*Postgresql).Gather+0x2de			/app/plugins/inputs/postgresql_extensible/postgresql_extensible.go:152
#	0xac0d33	github.com/influxdata/telegraf/models.(*RunningInput).Gather+0x53						/app/models/running_input.go:149
#	0x5cfeb07	github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1+0x27						/app/agent/agent.go:575

And one goroutine with following:

1 @ 0x43e8ee 0x44fdd8 0x44fdaf 0x46dfc5 0x49171d 0xae3f0f 0xae3eee 0xae3e7e 0xae151c 0xae12dc 0xae2f5a 0xae3da5 0xae73da 0xafc5bc 0x324bfab 0x3263ca5 0x3263885 0x326349f 0xac0d34 0x5cfeb08 0x472141
#	0x46dfc4	sync.runtime_SemacquireMutex+0x24										/usr/local/go/src/runtime/sema.go:77
#	0x49171c	sync.(*Mutex).lockSlow+0x15c											/usr/local/go/src/sync/mutex.go:171
#	0xae3f0e	sync.(*Mutex).Lock+0x4e												/usr/local/go/src/sync/mutex.go:90
#	0xae3eed	github.com/awnumar/memguard/core.Purge.func1+0x2d								/go/pkg/mod/github.com/awnumar/[email protected]/core/exit.go:23
#	0xae3e7d	github.com/awnumar/memguard/core.Purge+0x1d									/go/pkg/mod/github.com/awnumar/[email protected]/core/exit.go:51
#	0xae151b	github.com/awnumar/memguard/core.Panic+0x51b									/go/pkg/mod/github.com/awnumar/[email protected]/core/exit.go:85
#	0xae12db	github.com/awnumar/memguard/core.NewBuffer+0x2db								/go/pkg/mod/github.com/awnumar/[email protected]/core/buffer.go:73
#	0xae2f59	github.com/awnumar/memguard/core.(*Coffer).View+0xb9								/go/pkg/mod/github.com/awnumar/[email protected]/core/coffer.go:86
#	0xae3da4	github.com/awnumar/memguard/core.Open+0x44									/go/pkg/mod/github.com/awnumar/[email protected]/core/enclave.go:111
#	0xae73d9	github.com/awnumar/memguard.(*Enclave).Open+0x19								/go/pkg/mod/github.com/awnumar/[email protected]/enclave.go:43
#	0xafc5bb	github.com/influxdata/telegraf/config.(*Secret).Get+0x7b							/app/config/secret.go:137
#	0x324bfaa	github.com/influxdata/telegraf/plugins/inputs/postgresql.(*Service).SanitizedAddress+0x4a			/app/plugins/inputs/postgresql/service.go:160
#	0x3263ca4	github.com/influxdata/telegraf/plugins/inputs/postgresql_extensible.(*Postgresql).accRow+0x304			/app/plugins/inputs/postgresql_extensible/postgresql_extensible.go:217
#	0x3263884	github.com/influxdata/telegraf/plugins/inputs/postgresql_extensible.(*Postgresql).gatherMetricsFromQuery+0x384	/app/plugins/inputs/postgresql_extensible/postgresql_extensible.go:184
#	0x326349e	github.com/influxdata/telegraf/plugins/inputs/postgresql_extensible.(*Postgresql).Gather+0x2de			/app/plugins/inputs/postgresql_extensible/postgresql_extensible.go:152
#	0xac0d33	github.com/influxdata/telegraf/models.(*RunningInput).Gather+0x53						/app/models/running_input.go:149
#	0x5cfeb07	github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1+0x27						/app/agent/agent.go:575

Additional info

No response

@redbaron redbaron added the bug unexpected problem or unintended behavior label Aug 21, 2023
@powersj
Copy link
Contributor

powersj commented Aug 28, 2023

next steps: similar to #13804, want to get on the same page with Sven and go through each of these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants