Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracing misbehaviour when service under load #1645

Closed
vvasilevbosch opened this issue May 29, 2023 · 4 comments
Closed

Tracing misbehaviour when service under load #1645

vvasilevbosch opened this issue May 29, 2023 · 4 comments

Comments

@vvasilevbosch
Copy link
Contributor

vvasilevbosch commented May 29, 2023

Incomplete tracing is observed, while load testing the service with modify thing commands, via kafka connection. What can be seen on the trace, is that there are spans with invalid parent span IDs and also a lot of missing spans that should be there. I attach json export of two traces(complete and incomplete) as well as screenshot from jaeger ui.
jaeger-invalid-parent-span-ids
traces.zip

@thjaeckle
Copy link
Member

@vvasilevbosch are you sampling 100% of all requests?
And how much load are we talking about?

Because I would assume that some dropping of traced requests is done before the tracing would slow down the functionality of the service or would overwhelm the OTEL endpoint.
The used logback logstash appender also does that. Under heavy load, not all log statements might be available.

Maybe this is even configurable in Kamon, the library Ditto uses for tracing.
Did you check?

@vvasilevbosch
Copy link
Contributor Author

@thjaeckle I have the following setup: 1_000_000 things, 8 connectivity,policies,things, 1 things-search and 1 gateway, 1 kafka connection with 8 clients, to which I send 5000 modifyThing messages per second. I will further check the Kamon configuration. Thanks!

@thjaeckle
Copy link
Member

Ok, with this load I would expect that you would have to scale your Jaeger backend.
Every command will cause at least 5 spans of a trace, reported via at least 3 services in Ditto.

More realistic IMO would be to configure that only eg 1% of the requests are sampled..

@vvasilevbosch
Copy link
Contributor Author

I tried increasing the buffer size of the tracing reporter, but it seems there is a bug in the Kamon library, I have raised an issue in their repo: kamon-io/Kamon#1281

Closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants