Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Transition to 128bit trace ids (still 64bit span ids) #1262

Closed
codefromthecrypt opened this issue Aug 31, 2016 · 43 comments
Closed

RFC: Transition to 128bit trace ids (still 64bit span ids) #1262

codefromthecrypt opened this issue Aug 31, 2016 · 43 comments
Labels
model Modeling of traces

Comments

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Aug 31, 2016

Many times over the last years, we've had folks ask about 128 bit trace ids.

There have been a few use-cases mentioned...

  • archiving traces longer than a couple days (concern with collisions when retrieving by id)
  • using externally generated ids as trace identifiers (ex AWS lambda UUID)
  • joining traces that have 128bit identifiers (ex Dapper/Cloud Trace now use 128bit trace ids)

Doing this comes at a cost of figuring out how to do interop, but that's well-charted territory:

  • Zipkin already decouples trace ids from span/parent ids; This reduces scope to only trace ids.
  • Zipkin trace identifier encoding is hex, which is easy to widen and detect.
  • There exists practice of people padding 64bit ids to 128 bit, and bit choosing to reduce to 64 bit.
  • Cassandra and MySQL support blobs in key clauses, and elasticsearch can as easily index a hex string of length 16 as it can one of length 32.

Practically speaking we'd need to do the following to make this possible:

  • B3 - Update both http and binary forms to optionally accept wider trace ids
  • Thrift - add an optional binary field to hold the wider id
  • Model - Add Span.Builder.traceId(blob) and pad .traceId(long) to fit 128bits
  • Storage
    • overload getTraceById to accept a 128bit blob.
    • add 128bit column to mysql
    • add 128bit index to cassandra
    • write both 64 and 128bit ids until Model v2

Provided we don't creep scope to also change span, parent id encoding, I think we can actually do this, and end up with an incremental win on the model that can help us bridge into systems that use wider ids increasingly often.

@codefromthecrypt
Copy link
Member Author

spamming some folks of interest, of course not exhaustive @nicmunroe @felixbarny @shakuzen @yurishkuro @jcarres-mdsol @abesto @eirslett @kristofa @michaelsembwever @kevinoliver @mosesn @anuraaga @marcingrzejszczak @prat0318 @devinsba @basvanbeek @schlosna @ewhauser @klingerf @bogdandrutu @clehene

@codefromthecrypt codefromthecrypt changed the title Transition to 128bit trace ids (still 64bit span ids) RFC: Transition to 128bit trace ids (still 64bit span ids) Aug 31, 2016
@codefromthecrypt
Copy link
Member Author

One idea to bridge this in the current model is to add another u64 field: traceIdHigh

This would be somewhat easy to implement, and we can default to zero. When 128 bit ids are used, they would be split between traceIdHigh and traceId (in data structures). In storage, they can either be concatenated or stored separately.

@codefromthecrypt
Copy link
Member Author

actually, I don't like traceIdHigh as it would leave identifiers in a transitional state (and make querying awkward). I think it would be best to obviate the old traceId field with a wider one rather than append to it.

@mosesn
Copy link
Contributor

mosesn commented Aug 31, 2016

I like this–in practice we run into collisions frequently. I'm not sure I like increasing the size of the thrift frame. What would you estimate the overall increase in storage size would be for TBinary and for TCompact?

@schlosna
Copy link
Contributor

+1 from me as we have had to go from 128-bit to 64-bit trace IDs when pushing internally traced things into Zipkin

@nicmunroe
Copy link

Interop with wingtips should work out of the box since trace IDs are modeled internally as strings, and we can add some config to wingtips to auto-generate trace IDs as 128 bit if desired by the user. So no concerns from me. 👍

@yurishkuro
Copy link
Contributor

actually, I don't like traceIdHigh as it would leave identifiers in a transitional state (and make querying awkward). I think it would be best to obviate the old traceId field with a wider one rather than append to it.

@adriancole do you have an alternative proposal for thrift? binary?

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Sep 1, 2016

actually, I don't like traceIdHigh as it would leave identifiers in a
transitional state (and make querying awkward). I think it would be best to
obviate the old traceId field with a wider one rather than append to it.

@adriancole https://github.com/adriancole do you have an alternative
proposal for thrift? binary?

I didn't mean to imply not using thrift. I meant adding another field with
some notes about how to use it. Similar to how we added ipv6

11: optional binary traceId128

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Sep 1, 2016

I like this–in practice we run into collisions frequently. I'm not sure I
like increasing the size of the thrift frame. What would you estimate the
overall increase in storage size would be for TBinary and for TCompact?

So, there's two places of concern to finagle.

For example, the trace context (SpanId) field would need to look at the
size before deserializing. This would grow from 32 to 40bytes per span,
plus whatever encoding overhead. This assumes the simplest option, where we
just write more bytes to the first field.

At the moment, we only use big-endian TBinaryProtocol. This simplifies a
lot of reporters, particularly kafka which doesn't have a field to carry
what the encoding type is. As long as we keep the scope to only trace ids,
this is a fixed overhead of how to encode the extra 64 bits.

If we go for a separate "binary traceId128" field, we add 26 bytes per thrift (3 /*
for TType.STRING field declaration / + 4 / for length prefix / + 16 / for id
/)
If we go for an additional "i64 traceIdHigh" field, we add 11 bytes per thrift (3
/
for TType.I64 field declaration / + 8 / for high bytes */)

Since no-one is likely indexing on thrift offsets, traceIdHigh might not be
as bad as I thought. Ex we could still have json just use a wider amount of
hex characters in its traceId field.

Assuming we'd need a separate index (minimally +16bytes), I'd say the
storage costs are spanCount * at least 27-42 bytes per span.

on the topic of TBinaryProtocol vs TCompactProtocol
if we are looking to optimize serialized size, I'm not sure I'd use either
since other options are now available, and also we can rejig the model,
too. So, leaving that out-of-scope for this talk.

@jcarres-mdsol
Copy link
Contributor

I do not know how realistic are the chances of clashing in a 2**64 space. But most other systems are moving to 128-bit uuids so for the sake of uniformity sounds like a good change to me

@codefromthecrypt
Copy link
Member Author

incidentally, just peeked at a work-in-progress for GRPC trace propagation.

They are using 2 64bit fields to capture the 128bit trace id.

// A TraceId uniquely represents a single Trace. It is a 128-bit nonce.
message TraceId {
  fixed64 hi = 1;
  fixed64 lo = 2;
}

An aside, but they don't share the same span id across RPC calls, so only need to propagate the parent (as opposed to the parent and the current id)

// Tracing information that is propagated with RPC's.
message TraceContext {
  // Trace identifer. Must be present.
  TraceId trace_id = 1;
  // ID of parent (client) span. Must be present.
  fixed64 span_id = 2;
  // true if this trace is sampled.
  bool is_sampled = 3;
}

@yurishkuro
Copy link
Contributor

Yes, I did mean extension to current thrift. String sounds rather wasteful, binary will have the minimal required length. Problem with both is that in-memory they will have to be pointers, whereas something like traceIdHigh could be a primitive value (then again, if it's optional it's still going to be a pointer).

@codefromthecrypt
Copy link
Member Author

@yurishkuro there is no encoded string type in thrift. string (TType 11) fields are binary

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Sep 1, 2016

ps on high-low, unless I'm missing something, there's effectively no difference between traceIdHigh = 0 and traceIdHigh = null, so even if the field is optional in IDL, I'd just make it default to zero (ex in java). Reason being is that it only is used when traceId is set, and if you are packing 128 bits shifting 0 on the high side is the same as shifting nothing into the high side.

@yurishkuro
Copy link
Contributor

there is no encoded string type in thrift. string (TType 11) fields are binary

What about these?

struct BinaryAnnotation {
  1: string key,
  2: binary value,

In the generated classes those will be string vs. []byte so to store 128bit value in string it would have to be encoded as hex or base64, while in binary it will be stored as is.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Sep 1, 2016

@yurishkuro didn't mean to distract the issue. you are right that in thrift IDL there's a "binary" compiler hint that says the STRING field is being used for opaque bytes. I changed my example above.

@anuraaga
Copy link
Contributor

anuraaga commented Sep 1, 2016

Just for reference, Stackdriver Trace uses a 32-byte string in their protobuf API. 16 bytes wasted, but compared to the rest of the span, maybe not a big deal.

https://cloud.google.com/trace/api/reference/rpc/google.devtools.cloudtrace.v1

I tend to have a preference for readable IDs since they're used very often in debugging, so find the string type appropriate for it but can also see arguments for wanting to optimize the data over the wire.

@mosesn
Copy link
Contributor

mosesn commented Sep 1, 2016

Yeah, I'm beginning to lean toward hi/lo.

@codefromthecrypt
Copy link
Member Author

cool thing in thrift is we can actually rename traceId to lo without
breaking anything (on the wire anyway) :)

On Thu, Sep 1, 2016 at 10:50 PM, Moses Nakamura [email protected]
wrote:

Yeah, I'm beginning to lean toward hi/lo.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1262 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAD610M7MBbJPPnxTdr58_EchGfgsuFAks5qluYlgaJpZM4Jxav9
.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Sep 7, 2016

I haven't seen anyone against this, so here's a proposal:

In thrift and zipkin.Span, add a field.. this can happen immediately

  • add traceIdHigh, default to zero

Make all of the below that accept trace ids, check length and prefer lower 64 bits

  • http servers that receive "X-B3-TraceId" check length and throw away higher ones
  • Finagle TraceId and Brave SpanId check buffer length. If 40, there's a 128 bit trace id in there
  • Json decoders (probably only zipkin.Codec and zipkin UI) check length and throw away higher bits of traceId

Change all of the storage backends to use a 128 bit traceId index. Pad 64 bit trace ids to 128 bit on write. Toss higher bits on read.

After all the above is complete.. we can start writing the longer ids as a matter of course.

  • change encoders to write 128 bit for B3 Headers and B3 Binary, and Json as opposed to tossing the high bits

@shalako
Copy link

shalako commented Sep 7, 2016

In Cloud Foundry, we already use a 128-bit trace id, in uuid format. To add support for Zipkin, we're having to add a second trace id because zipkin only supports a 64-bit id. It would be preferable to only have one trace id.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Sep 7, 2016

Note this is for 128 bit ids not uuid (which are not 128bits of randomness
and also longer due to hyphens). Ex all lowercase hex chars.

This design is focused on permiting 128bit fully random ids, not the shape
of uuid.

Zipkin has binary annotations (key,value) where different shaped ids can be
placed. For text search and indexing, all lowercase hex 128bit is what this
is proposing, which is not only a more straightforward transition from our
current 64bit lowerhex, but incidentally easier to copy paste than uuids.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Sep 8, 2016 via email

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Sep 8, 2016 via email

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Sep 8, 2016 via email

@codefromthecrypt
Copy link
Member Author

Sorry one more thing I should have said in the beginning.. the audience of
this change!

Keep in mind that while most who want to integrate with Zipkin use B3
propagation, not everyone does. B3 is the dominant style, but 128 bit will
be used even when systems are propagating with something unlike B3.

For example, Jaeger can log to zipkin, but uses a single concatenated
header. Once we support 128bit, grpc will too, but also they wont use B3.

It is important to note that while we are discussing intimate details of
B3, it is not to preclude someone from encoding propagation differently,
especially as that's already the case today.

So, there are three audiences for this change..

  • those using B3 and Zipkin
  • those integrating with B3 but not Zipkin (ex to reuse existing tracers
    but a different system)
  • those integrating with Zipkin, but not using B3 (ex to reuse storage and
    correlation)

There is a sub-group of above who want to know how to do log correlation,
too.

@codefromthecrypt
Copy link
Member Author

here's a work-in-progress for a java library that does 128bit B3 propagation based on finagle/brave. It doesn't yet do http headers. Note that thrift or json serialization isn't in scope of propagation.
openzipkin/b3-propagation#2

@codefromthecrypt
Copy link
Member Author

Added some notes about how 128bit ids will look in practice openzipkin/b3-propagation#5

codefromthecrypt pushed a commit that referenced this issue Sep 14, 2016
This bridges support for 128bit trace ids by not failing on their
receipt. Basically, this throws out any hex characters to the left
of the 16 needed for the 64bit trace id.

See #1262
@codefromthecrypt
Copy link
Member Author

moving to implementation in #1298

@codefromthecrypt
Copy link
Member Author

fyi, there's a tentative understanding that sampling (when traceId is a function) will only consider the lower 64-bits. The assumption is that the resolution is good enough and it prevents us from needing to redefine samplers. (recommended by @basvanbeek and I agree)

@codefromthecrypt
Copy link
Member Author

throwing something out there.. what if we adopted a default behavior to make the 128bit trace ID exactly how Amazon X-Ray does? In this case, it would be possible to parse into an X-Ray ID, and still not impact us (as folks only sample lower 64 anyway).

Ex. first 8 hex are epoch seconds, next 24 are random.

I suspect the cost would be time to get the epoch secs. Also, we'd not know if the trace was generated like this (in order to parse it), but anyway, there'd be a chance. Worth it? Not worth it?

http://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-request-tracing.html

@jcarres-mdsol
Copy link
Contributor

Intriging.
In my company we are fighting the fact that there is no control over AWS stuff.
Any kind of compatibility would be interesting

@jcchavezs
Copy link
Contributor

Interesting, however I would more think about adding a new layer for ID generation where one implementation would be a AWS X-Ray instead of adopting it from them. What do you think?

@yurishkuro
Copy link
Contributor

Do Amazon docs explain why recording the time is useful?

@bogdandrutu
Copy link

@yurishkuro I think the mean reason was something like this:

  • They have a TTL (not sure if I remember exactly what was, but I think they said 5 days) everything with the epoch older is dropped (even from their collection pipeline).

@yurishkuro
Copy link
Contributor

That sounds like an odd reason. The collection pipeline already knows when it receives the spans and can establish a TTL from that point, passing the timestamp in the headers doesn't add much there. There must be some other reason.

@bogdandrutu
Copy link

@yurishkuro the example that they gave me when I asked that they have the timestamps in the Span for something similar they said something like this:
"when you receive the Span how do you know if that is part of the trace that started 5 days ago or not?"

Kind of they want to make decisions (analysis, keeping not keeping the span, maybe where to store it (there can be nice improvements based on this, have a way to store latest 1h trace then re-index etc.)) based on when the trace started.

@yurishkuro
Copy link
Contributor

I see. Makes sense. Thanks.

@codefromthecrypt
Copy link
Member Author

In the case of Zipkin, there is a large amount of users in Amazon, and a lot of the time we generate 128-bit w/o any consistency anyway. Some copy/paste the 64bit, etc. The rationale I was told from Abhishek is that the backend will reject the trace. The timestamp is used to store the data and enable even partitions in the backend. So, basically not using 32 of the 128 for the timestamp has a severe penalty if trying to use X-Ray propagation. 96 bits of random is great for probably 99.9% of deployments, and also doesn't impact sampling as most only look at lower 64-bit anyway.

If we had great support for a different way of 128-bit trace ID generation, I'd not suggest something like this, but as we don't, seems a very good reason to recommend and track one. Of course folks can make trace ID generation pluggable, but I think the tradeoff wins in favor of making things align with the largest cloud.

Taking this forward, for example, I'd open a ticket on B3 with a suggested generation format for 128-bit trace IDs, with this rationale, and whoever wants to do this could originate trace IDs that Amazon won't toss. Remember, it isn't just X-Ray, it is also ALB, Api Gateway and other products that use the same ID format (if only for log correlation).

This means even if users never use X-Ray storage (which some sites can choose to as it is possible to convert zipkin data to X-Ray data), at least propagation will pass and you can ID correlate and pull data in a pinch.

I think of this as tactical as it is easy to do and can be done even today. It allows us to tunnel AWS IDs through clients written for B3. Trace-Context stuff is more strategic as there's more going on there. This sort of integration shows almost exactly the pattern needed for trace-context, too.

Anyone don't want me to suggest why a tracer might put timestamp in first 32 of 128? If I did, I'd mention it isn't fully random etc, and also maybe some sanity check tricks. Personally, I think not telling people about this is more damaging than telling them, as there's no easier path to Amazon integration (at least that I can think of).

@jcarres-mdsol
Copy link
Contributor

Sounds awesome to me. I see no problem following their format, for better or worse AWS is a big part of internet traffic, making your system work well with theirs will be a win-win for everyone

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Oct 2, 2017

@jcarres-mdsol so I have been toying around and hopefully by the end of the day I can verify zipkin and x-ray interop works (prereq to actually suggesting this in more than a hypothetical way). Will share as soon as done, but I see no road-blocks.

Other notes:
Amazon actually have the same sampling approach as we do. They support a "deferred" decision (same as absence of B3-Sampled flag). Only difference is that in this case the http response contains the result of that decision. However, I checked and this behavior isn't mandatory.

Some concern raised by @jcchavezs about encoding sensitivity. Here's the response to this. Amazon's format is version stamped. If the version isn't one, we can't continue the trace anyway (until that version is understood). We have choices including adding a tag about the linked ID before the trace is restarted. Amazon have a very good api compat track record. I am in no way concerned about X-Ray trace format v1 disappearing next year. We don't know what v2 will be anyway, and who knows, it might be the trace-context format. Anyway TL;DR; is that in my playground, I'll verify the version, and when processing inbound headers, we can think about how to restart a trace if we are in strict AWS compat mode (implies restarting a trace if the first 32 bits could not have been epoch seconds).

So, basically there are two parts.. the easy one is just using a different algo for root IDs. The second part is for AWS interop... if a library must use AWS-compat IDs, then it needs to restart traces which don't include valid epoch seconds. The second part only impacts sites running on AWS, and can be handled by an optional tracer plugin.

@yurishkuro
Copy link
Contributor

@adriancole so what kind of interop are you targeting, a Zipkin-instrumented system talking to AWS SaaS?

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Oct 2, 2017 via email

@codefromthecrypt codefromthecrypt added the model Modeling of traces label Oct 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model Modeling of traces
Projects
None yet
Development

No branches or pull requests

10 participants