Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/instrument process node #1

Open
wants to merge 56 commits into
base: master
Choose a base branch
from

Conversation

boxsterman
Copy link

No description provided.

Copy link
Contributor

@dpsoft dpsoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boxsterman LGTM I've left a simple comment. @ivantopo WDYT

@boxsterman
Copy link
Author

@dpsoft Finally got around to address your feedback ... too much other work :(

Copy link

@ivantopo ivantopo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @boxsterman, I finally started looking into this PR. Solid work so far 🎉

I just looked at the producer/consumer instrumentation right now since I'm still wrapping my head around the streams part, hopefully we can start polishing that side and I'll get back as soon as possible regarding the streams side of this 😄

Besides the minor comments on some methods, there are two other bigger things I would like to address:

Regular vs Delayed Spans

Besides the follow strategy we should also have a setting to determine whether the generated Span should be a "regular" span or a "delayed" Span. The idea behind the delayed Spans is that they can capture additional information about the processing in asynchronous scenarios, particularly, how long was an operation waiting before actually being processed. I think it would go like this:

  • When using delayed Spans is enabled we could count the "wait time" as the difference between the time a message was produced and the time when the "poll" method started and the actual processing time for that Span will be the time it takes for "poll" to complete.
  • When not using delayed Spans the only time reflected will be processing time, which should be equal to the time take for the "poll" function to run (btw, this is not the case as the moment, please see the relevant comments)

Continuing Tracing on the Consumer Side

Once a record has been consumed, the user will want to continue doing some processing of that record and that processing should continue the same trace started/joined by the consumer. Currently, I don't see any clear way to do that and I think there should be one. One idea that comes to mind is to inject the generated Spans on the ConsumerRecord objects via instrumentation and provide an utility class for users to extract that Span and use it as parent of their own Spans.

What do you think about the above?

build.sbt Outdated
val scalaExtension = "io.kamon" %% "kanela-scala-extension" % "0.0.10"
val kamonCore = "io.kamon" %% "kamon-core" % "2.0.0"
val kamonTestkit = "io.kamon" %% "kamon-testkit" % "2.0.0"
val scalaExtension = "io.kamon" %% "kamon-instrumentation-common" % "2.0.0"

val kafkaClient = "org.apache.kafka" % "kafka-clients" % "0.11.0.0"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason for not using the latest kafka clients/streams versions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrated to Kafka 2.3.0.

The signature of the instrumented method in the Kafka client has changed, how shall we address compatibility with older Kafka version (pre 2.3.0)? Shall I create a dedicated module depending on Kafka 2.0.0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we would keep it all in the kamon-kafka project and use such property-based instrumentation, wouldn't that cause issues with eviction of (older) Kafka libraries on the side of the users of such a kamon-kafka lib?

build.sbt Show resolved Hide resolved
ContextSerializationHelper.fromByteArray(h.value())
}.getOrElse(Context.Empty)

val span = consumerSpansForTopic.getOrElseUpdate(topic, {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, i was wondering about poll-spans cardinality here.
This way it will produce one span per consumed topic within a poll, is this intentional and whats the reasoning behind it?

Seams it could go two ways:

  • One span per poll, in which case record specific tags dont makes sense (partition, key, offset). Any further streaming spans should link to poll span due to potentially large number of child spans (default max.poll.records is 500?)
  • One span per polled record, can be used as a root for all streaming stages spans. This is aligned with what is done on producing side instrumentation which tracks individual record send()s and ingores batching.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mladens
thanks a lot for your feedback and for pointing out this span inconsistency!

I've refactored that part and now the following spans are created:

  • one span for the actual poll operation: using its "own" (existing) traceId context this allows tracing of individual poll operations and how many records they actually polled.
  • one span for each record polled: using the record's send record as parent span or linked span if follow-strategy = false this enable tracing of the business-related flow of the application.

This way it becomes also quite visible how Kafka consumer settings like autoCommit can influence the consumer.

Here are two examples on how the instrumentation now looks like:

Simple publish/subscribe with two messages (autoCommit=true)
client-autoCommit

Simple publish/subscribe with two messages (autoCommit=false)
client-noAutoCommit

  • solid edges: parent/child spans
  • dashed edges: linked spans

In the second example was the second record not commit fast enough so that it popped up in the next poll operation. It will not be forwarded to the application since it is still "in flight" on the client side.

@boxsterman
Copy link
Author

boxsterman commented Oct 15, 2019

Hi @ivantopo

I've added support for accessing the (completed) span of a consumer record in order to create further child spans with it. Therefore I've created a new mixin to only carry the span and not the whole context ... to avoid dealing with changes/updates to the context itself.

This is how the simple example looks now:
client-hasSpan

What do you think? I'll continue to add now a "real" context mixin to the stream to properly keep that context.

* to make the span available as parent for down stream operations
*/
onSubTypesOf("org.apache.kafka.clients.consumer.ConsumerRecord")
.mixin(classOf[HasSpan.Mixin])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be HasContext instead of HasSpan, please see the comments on the HasSpan class.


import scala.util.Try

trait HasSpan {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the Context might have additional information we cannot hide it from the users and instead of mixin the Span we should mixin the Context that has the Span (should be the deserialized Context with the new Span inside).


object SpanExtractionSupport {

implicit class SpanOps[K, V](cr: ConsumerRecord[K, V]) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments on this one:

  • Is there a situation in which users themselves will want to write a Context into the ConsumerRecord? Maybe I'm missing something here.
  • We need to add a method to get the Context, but we can also have one to get the Span, just because it is convenient. In that method please return just the Span, not Option[Span].. if there is no span in the context Kamon will return a Span.Empty instead, same goes for the Context. We use these empty objects instead of returning Options of things because it provides a much more consistent usage across Java/Scala/Kotlin.
  • I would recommend moving this to the Client companion object and breaking it down into two parts: methods on the companion object itself like extractContext(consumerRecord)/extractSpan(consumerRecord) and a Syntax implicit class that does what this class does. This is to allow Java users to access these functions but still keeping the cool stuff for the Scala folks 😄

@@ -4,7 +4,10 @@

kamon {
kafka {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the rest of the modules this should be under kamon.instrumentation.kafka. BTW, same goes with the code itself! It should be under the kamon.instrumentation.kafka package.

@ivantopo
Copy link

@boxsterman the record should get the Context mixed in, not just the Span because there might be more interesting data in the Context that also needs to be used by the client, like Context Tags or some other Context Entries besides the Span. I just made some comments on the relevant files.

@boxsterman
Copy link
Author

@ivantopo Yes, I've noticed that the HasSpan idea was not my best and somehow part of my learning curve :)
Therefore I'm only using the HasContext in the newer parts ... and will replace HasSpan accordingly.

@mladens
Copy link

mladens commented Dec 4, 2019

Hi @boxsterman, what are your plans regarding this PR, do you have any more ideas/features you wanted to see in there or? If there's anything i can do to help and get this closer to release, let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants