Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amazon X-Ray interop #1754

Closed
codefromthecrypt opened this issue Oct 2, 2017 · 12 comments
Closed

Amazon X-Ray interop #1754

codefromthecrypt opened this issue Oct 2, 2017 · 12 comments

Comments

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Oct 2, 2017

Similar to how last year, we had users requesting for Google Stackdriver compatibility, there are users explicitly requesting Amazon X-Ray, as well general support and interest from @abhiksingh (X-Ray product lead), and indirect comments which I can't currently find about tension between lambda and zipkin architecture*.

There's no doubt that Zipkin and Amazon interop have been important in the past. Many of our core team rely on AWS and/or make custom components for AWS. This issue will explore how we could fit-in, and how we could allow 3rd party tracers designed for B3 to have the smallest impact to support X-Ray.

Similar to StackDriver, there's two major concerns: propagation and out-of-band data.

Unlike StackDriver, propagation is wider than the X-Ray service. For example, in AWS the same propagation format is used even when X-Ray isn't: ELB uses the same format eventhough ELB doesn't write to X-Ray. There are also interesting concerns such as that Api Gateway restarts traces at its edge, also in X-Ray format. These types of concerns weren't present when we integrated with StackDriver, although they are building in the new trace-context format, targeted initially inside Google to StackDriver and gRPC services.

Also unlike StackDriver, the trace ID requires a 32-bit timestamp. This has some impact, as if there's an invalid timestamp, the service will drop any related data. For this reason, pragmatic ID generation strategy is something we must consider. For example, creating "interop" IDs by default where the first 32-bits are a timestamp and the following are 96-bits of random. This discussion occurred at the end of #1262.

Reporting is very much like what we did in StackDriver. The X-Ray format is span structured, with an api to post data to. While it has more structure than Zipkin data, mapping rules like exist in StackDriver are more than possible. Zipkin-compatible (or otherwise) tracers could send data to X-Ray's daemon (automatically present in lambda), the X-Ray POST api, or to a zipkin destination that does one of the two.

The above is fairly reliable from initial explorations, but could change with experience. This issue will track the exploration and any related issues on Zipkin's side.

  • If you are running a lambda (serverless) architecture, running a zipkin server, even a stupid proxy to X-Ray one, could feel heavyweight. In this case, writing directly to X-Ray, or translating zipkin to AWS via a kinesis lambda could make sense.
@codefromthecrypt
Copy link
Member Author

cc @llinder @devinsba @openzipkin/core

@yurishkuro
Copy link
Contributor

So these are the interop models for the out-of-band data?

  • [zipkin SDK] -> [X-Ray backend] directly
  • [zipkin SDK] -> [zipkin collector] -> [X-Ray backend]

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Oct 2, 2017 via email

@codefromthecrypt
Copy link
Member Author

random note: http://docs.aws.amazon.com/xray/latest/api/API_PutTraceSegments.html The POST format is a list of escaped json, as the doc implies.

Ex.

{"TraceSegmentDocuments": [
"{\"id\": \"0b89f1dec76af795\", ..."
]}

codefromthecrypt pushed a commit to openzipkin/brave that referenced this issue Oct 3, 2017
This customizes the trace ID generator to make the high bits convertable
to Amazon X-Ray trace ID format v1.

See openzipkin/zipkin#1754
codefromthecrypt pushed a commit to openzipkin/brave that referenced this issue Oct 3, 2017
This customizes the trace ID generator to make the high bits convertable
to Amazon X-Ray trace ID format v1.

See openzipkin/zipkin#1754
@codefromthecrypt
Copy link
Member Author

Added an example implementation of trace ID with a time component. Unsurprisingly, it is slower than a fully random ID. However, the scale is still sub microsecond (on my laptop™), and only affects the root span: openzipkin/brave#509

Next step is to add a converter which proves the concept.

@codefromthecrypt
Copy link
Member Author

experimental work starting in Brave here openzipkin/brave#510

@codefromthecrypt
Copy link
Member Author

Thanks to @jcarres-mdsol for making new trace ID provisioning instructions a bit simpler:

|---- 32 bits for epoc seconds --- | ----- 96 bits for random number --- |
it can potentially be implemented by:
High 64:
|---- 32 bits for epoc seconds --- | ----- 32 bits for random number --- |
Low 64:
| ----- 64 bits for random number --- |

Optional cheap sanity check the high 32 bits are epoch seconds
58000000 = 1476395008 = 2016-10-13 < prior to X-Ray and zipkin supporting 128-bit trace IDs
60000000 = 1610612736 = 2021-01-14 < of course you can even more future proof

marcingrzejszczak added a commit to spring-cloud/spring-cloud-sleuth that referenced this issue Jan 19, 2018
with this pull request we have rewritten the whole Sleuth internals to use Brave. That way we can leverage all the functionalities & instrumentations that Brave already has (https://github.com/openzipkin/brave/tree/master/instrumentation).

Migration guide is available here: https://github.com/spring-cloud/spring-cloud-sleuth/wiki/Spring-Cloud-Sleuth-2.0-Migration-Guide

fixes #711 - Brave instrumentation
fixes #92 - we move to Brave's Sampler
fixes #143 - Brave is capable of passing context
fixes #255 - we've moved away from Zipkin Stream server
fixes #305 - Brave has GRPC instrumentation (https://github.com/openzipkin/brave/tree/master/instrumentation/grpc)
fixes #459 - Brave (openzipkin/brave#510) & Zipkin (openzipkin/zipkin#1754) will deal with the AWS XRay instrumentation
fixes #577 - Messaging instrumentation has been rewritten
@devinsba
Copy link
Member

devinsba commented May 9, 2019

Closing as I believe we have addressed this from zipkin-aws

@devinsba devinsba closed this as completed May 9, 2019
@msmsimondean
Copy link

@devinsba the issue is about providing interoperability with AWS X-Ray. Looking at zipkin-aws, that doesn't seem to provide that interopability; from what I can tell, it interops with other AWS services (SQS, Kinesis and Elasticsearch Service) but not AWS X-Ray. I can see some code in zipkin-aws that mentions X-Ray (e.g. reporter-xray-udp and storage-xray-udp). Does zipkin-aws provide some X-Ray integration that isn't documented in the codebase's README? Thanks in advance!

@devinsba
Copy link
Member

We have support for the XRay propagation format and sending traces/spans to XRay, is there some kind of integration that you are looking for that isn't either of these? Also any feature requests for more integration should be handled in the zipkin-aws repo

@msmsimondean
Copy link

msmsimondean commented May 24, 2019

@devinsba that's great. Is there any documentation available for setting it up or is that still to come? I'm just thinking of the good documentation at https://cloud.google.com/trace/docs/zipkin and https://github.com/openzipkin/zipkin-gcp for the equivalent Google Stackdriver Trace integration. Thanks

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented May 28, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants