Use diagnostics_channel in layers #96

Qard · 2020-09-17T19:06:33Z

This is a proof-of-concept implementation of the diagnostics_channel feature I'm working on in Node.js core. The value here is for APM products to be able to capture changes to routing state through the lifecycle of a request. This should of course not be merged until the module exists in Node.js itself. I'm just creating this now to prove the value to userland and to make sure a more complex framework can benefit from it.

APM vendors want to use events from diagnostics_channel to know when a route has changed and what the full routing path to the route is so they can bucket trace data based on the full routing path. For example, requests to /hello/world and /hello/stephen might map to a /:name route on a nested router under /hello which should be able to produce /hello/:name as the full routing path. This requires some method of tracking the active layer so the point at which the diagnostics_channel events are published there will be valid routing information present for the subscriber to gather from the request.

I'll leave this as a draft PR until diagnostics_channel lands in Node.js core.

mcollina

What's the overhead of adding diagnostics_channel to express enabled and disabled?

mcollina · 2020-09-18T07:05:22Z

lib/layer.js

@@ -65,9 +78,23 @@ Layer.prototype.handle_error = function handle_error(error, req, res, next) {
    return next(error)
  }

+  req.layerStack = req.layerStack || []


I would recommend against creating a new array for every request if diagnostics_channel is not enabled.

Fair. Seemed like a possibly useful feature outside of the diagnostics_channel stuff itself, but sounds like there's already another layer-tracking thing in discussion, so I'll take a closer look at that. :)

May also need to consider what the expected behavior is when different versions of this module are mixed together.

wesleytodd · 2020-09-18T14:55:47Z

I think this is actually two features added (not that I think that should pose a problem). It both tracks the layers hit on a req and also reports to the diag channel. The request for tracking the layers is long standing, and there are a few other implementations we would want to consider before we would land this. If you don't want to take that discussion on, but still want to see about landing the usage of this feature you might consider removing that.

FWIW, I think this approach is fine with that layer array, but it will still require most folks to iterate to determine the last "route" layer which is typically what folks want when they ask for this feature.

Existing PR for "layer tracking": #86

lib/layer.js

dougwilson · 2020-09-18T14:59:49Z

Yes @wesleytodd , I thought the same thing regarding this PR. Looking at the tracking this adds, I think it is done differently from what I've seen the others doing so far, including that linked PR. Specifically, it doesn't store the matched layers/routes on the request, instead it is trying to keep just a stack of the current hierchy of where it is at the moment, popping them back off once it leaves (vs keeping them for tracking which ones matched).

wesleytodd · 2020-09-18T16:53:43Z

Yeah, and maybe that is not a problem, but also if we can solve the long standing ask and also deliver on the needs to reporting to the diag channel I think we would want to.

Qard · 2020-09-18T16:59:06Z

Yeah, so the layer matching bits are basically because what an APM will want from this diagnostics_channel data is to be able to lookup what is the full path to the route I'm at right now.

Ideally we would be able to attribute any async activity to the route or middleware it originated from. I had initially tried to use the Span API to also include an end event to encapsulate each route or middleware, but ran into issues as it's easy to track when next is called but harder to track when a route "completes" by sending a response.

wesleytodd · 2020-09-18T17:01:25Z

For the "send a response" case you can just use on-finished, and then do your cleanup on either next or in that callback, whichever is hit first.

dougwilson · 2020-09-18T17:11:30Z

Gotcha. I see you are just string concatening together regexps and arrays in your end result path string, so I'm not sure that is going to work well as a general feature.

Qard · 2020-09-18T19:00:54Z

Doesn't look like on-finished is currently a dependency, so I'd have to add it. It would also somewhat complicate the logic of how diagnostics_channel is used here.

As for the path string, it's a concatenation of the routing path strings before being converted to regex. For example:

const router = new Router()
const hello = new Router()

router.use('/hello', hello)

hello.get('/:name', ...)

This router structure would result in a concatenated path of /hello/:name. Basically APM products want to know how to map individual routes to the full routing path it takes to get to it. We currently do that by monkey-patching express, which is fragile and has performance implications. We'd prefer to be able to receive that information through diagnostics_channel and not have to patch at all.

dougwilson · 2020-09-19T02:04:48Z

As for the path string, it's a concatenation of the routing path strings before being converted to regex

It sounds like you are assuming the users are passing a string in the first place. Take the following router combo, though:

const router = new Router()
const hello = new Router()

router.use(['/hello', /^\/(?:good)?bye/, '/aloha'], hello)

hello.get('/:name', ...)

Qard · 2020-09-21T17:04:00Z

That's fine. It will use the stringified form of the regex. The purpose is just to have a unique-to-that-route name to match transaction data to. If that name is /hello/:name or /^\\/(?:good)?bye//:name doesn't really matter, just that it has a name that can be relatively easily read by the user and traced back to the location in their app code. Some APMs may have support for keeping the routing fragments separate, but most would just join them.

dougwilson · 2020-09-21T17:11:16Z

Ah, I see. But of course this router allows you to declare the same route name multiple times, as it does not enforce that route names are unique. This is because of things like next() flow control. For example:

const router = new Router()
const hello = new Router()

router.use('/hello', hello)
router.get('/hello/:name', (req, res) => res.end(`hello, ${name}`))

hello.param('name', (req, res, next, name) => next(/^[a-z]$/.test(name) ? null : 'route'))
hello.get('/:name', (req, res, next) => res.end(`hola, ${name}`))

The above has effectively two different routes named /hello/:name that your code would identify, but they are two different unique routes. /hello/dan will invoke one while /hello/Dan will invoke the other.

dougwilson · 2020-09-21T17:18:59Z

lib/layer.js

  } catch (err) {
+    req.layerStack.pop()


Can this just be put into a finally block?

That'd run after the next(...) though, which would interfere with subsequent routes, as far as I can tell. Could do that if the next(...) on the following line was in a process.nextTick(...), but that'd have a performance hit.

Oh. But isn't the one within the try already running after the next call? Like if the middleware sync calls next then it will run after, but if the middleware async calls next it will be before.

Yeah, true. Probably a better place to put that code. 🤔

Yea, it looks like adding anything non-trivial to your tests seems to highlight the weirdness here. For example, I added the cors module to the nested routes test like the following:

it('nested routes have multiple layers with paths', function (done) { var outer = new Router(); var inner = new Router(); inner.get('/:name', function (req, res) { res.end(); }); outer.use('/hello', cors(), inner); function end() { assert.strictEqual(joinLayerStack(handleRequest.request.layerStack), '/hello/:name/'); done(); } outer.handle({ url: '/hello/world', method: 'GET', headers: {} }, { end, setHeader: noop }, noop); });

But it failed the test listing the path as /hello/hello/:name/, as I would surmise that it is performing the pop operation too late, after the middleware is getting iterated through (the more middleware added, the more times /hello appears).

I pushed a possible fix for the layer timing issue. Not sure it's the best solution, but seems to work. Still unsure if the layerStack approach is the best way. Willing to accept any suggestions on better ways to track the overall routing state of the app. 😅

Ah, yea. It seems seems to have issues with the latest change keeping track of the stack. For example adding inner.get('/world', (req, res, next) => next()) above inner.get produces the route as /hello/world/:name.

As far as a best solution... I think perhaps it may be better to maybe start from the top so we can understand what, exactly, we are trying to accomplish in this pull request? The PR title and description only mentions adding hooks for diagnostics_channel, but almost all the following discussion does not seem related to the actual diagnostics_channel usage, and rather adding a completely new feature, independent of the diagnostics_channel feature.

Should these two things be split apart into two PRs (first suggested at #96 (comment)) or can the description be updated to describe what the goal of this PR is? That may help better focus the conversation and code work.

Capturing routing information is the whole point of adding diagnostics_channel, so this PR doesn't really make sense without it. I'll update the description to elaborate on that need.

Yea, I get it, but let's think iteratively here, if you are interested in getting things landed instead of waiting for the entire pie to be built. For example, folks want to construct the path even outside of diagnostics_channel, so having access to the path is an independent ask of adding diagnostics_channel support. In addition, if you didn't have that layer stack hanging off the req object, you still already have access to a whole bunch of information, including that the request is being handled, that an error occurred, what the http method is, the http path, etc.

We get all that directly from http already. The only context we're really missing from express is the routing information. I only intended the layer tracking stuff in this to be used by diagnostics_channel. Maybe there are more general-purpose uses for that information, but I lack the context and the time to do anything about those right now. This PR was created primarily as a demonstration that the diagnostics_channel API is usable in more complex scenarios. It's not intended to be landed any time soon as the diagnostics_channel feature itself hasn't even landed yet. If you want something more complex from this, I'm probably not going to get to it for awhile.

Qard · 2020-09-21T17:33:09Z

I don't think it's a big issue that the paths aren't unique. It's just a bucketing mechanism. If a bucket might contain data from two routes that do different things but share the same path, it's not ideal, but there's not really any other great way to differentiate them. Most routes internally have some degree of branching anyway, so it's expected that every request in that bucket might not look exactly the same. It doesn't break anything if data from multiple routes winds up in the same bucket, it just makes the data maybe slightly less meaningful. Also, small side note: the HTTP method is also generally included in the string used as the bucket key.

I'm not concerned about two routes winding up with the same name, so long as that name can be instructive to the user on where in their app to look for the code. If they happen to have multiple routes with the same name, it's on them to look at those routes and figure out on their own which one a given trace came from. Route-naming is very much best-effort in APM. There's lots of cases where we get stuff like /static/* or something like that which will result in wildly different performance profiles and possibly very different execution paths. We just do our best to define reasonable buckets that are somewhat intuitive to the user and will help them to narrow down trace data to somewhat specific parts of their code.

dougwilson · 2020-09-23T04:33:19Z

test/diagnostics-channel.js

+
+    function end() {
+      assert.strictEqual(joinLayerStack(handleRequest.request.layerStack), '/hello/:name/');
+      assert.strictEqual(handleError.error, error);


Should this validate that the handle error layer stack is pointing to the / path for the error handler that was executed that called res.end()?

Not sure we should be treating error handlers as top-level routes, that's an implementation detail. It makes more sense from the APM perspective to treat them as continuations of whatever route the error handler was reached from.

Hm, interesting. I was assuming your wanted the layer stack to be the current stack of layers--you didn't note that certain layers were going to be treated differently. That is likely a very important point that was left out of your explanation.

So you are saying that even an error handler that is itself a route would not be listed? I.e. app.get('/blog/:slug', (err, req, res, next) = {}) It seems like when you have a complex app where you have a lot of error handlers all defined at a lot of different paths, it may be useful to understand which of your error handlers was the one that ran the error handling, no?

Taking a look, I'm not sure if this behavior is why the following will generate a layer stack showing the "route" is /hello/:name/hello/:val rather than showing it as /hello/:val when called as GET /hello/100:

var router = new Router(); router.get('/hello/:name', (req, res, next) => { if (!isNaN(Number(req.params.name))) next('route') else next() }, (req, res) => { res.end() }); router.get('/hello/100', (req, res) => { res.end() })

dougwilson · 2020-09-23T04:37:31Z

test/diagnostics-channel.js

+      done();
+    }
+
+    router.handle({ url: '/hello/world', method: 'GET' }, { end }, noop);


Just as a general comment: I'm not sure if you planed to clean this up later, so apologies if so, just wanted to call out that we don't want to be passing in mock-like objects to the router handling methods; they should be the real Node.js HTTP objects like all the other tests so we validate that everything is working as new Node.js versions come out and these objects change over time.

Yep, the changes are all copied and pasted from the PR I made directly to express, which apparently did tests differently. I'll clean up the tests at some point, when I get back to this. 👍

wesleytodd · 2024-03-16T18:32:50Z

@Qard I am working on getting this repo back on track for the renewed plans for v5. Are you interested in landing this still? Let me know so I can get it on our plans if so.

Qard · 2024-03-17T03:14:06Z

It would still be of value to have diagnostics_channel in there, yes. Though it probably needs reworking at this point to get it up-to-date. At the time we opted for targeting fastify as a routing framework to test diagnostics_channel with instead as express and related projects had somewhat stagnated at that time so we didn't see the changes as likely to actually land any time soon at that point.

I'll share with the team at Datadog to see if we want to put some time into updating this in the near future. No big deal if you're pushing for release soon though--we can add support later if necessary. Thanks for the reminder though! I had entirely forgotten I made this. 😅

wesleytodd · 2024-03-18T14:43:26Z

Yeah this would likely be a minor release so not a big deal to wait, but I was just trying to wrangle all the open things to make sure we had a clear plan in place and this still does seem valuable to me. Let me know if you folks have time to work on this!

Use diagnostics_channel in layers

7eba938

This was referenced Sep 17, 2020

Add diagnostics channel to layers expressjs/express#4408

Closed

lib: create diagnostics_channel module nodejs/node#34895

Closed

mcollina reviewed Sep 18, 2020

View reviewed changes

dougwilson reviewed Sep 18, 2020

View reviewed changes

lib/layer.js Outdated Show resolved Hide resolved

Fix channel names

231f3ee

dougwilson added needs docs ideas pr labels Sep 21, 2020

dougwilson reviewed Sep 21, 2020

View reviewed changes

Fix layerStack timing issue

bae6c12

dougwilson reviewed Sep 23, 2020

View reviewed changes

Qard mentioned this pull request Nov 26, 2020

Add support to diagnostics_channel to core fastify/fastify#2697

Closed

dougwilson force-pushed the master branch from bb3b1fe to 5fa7c9c Compare January 24, 2021 01:32

dougwilson force-pushed the master branch 2 times, most recently from 7e90c9e to 8643ec6 Compare May 19, 2021 16:51

wesleytodd mentioned this pull request Feb 27, 2024

Express LTS strategy expressjs/discussions#199

Merged

wesleytodd added the express-v5 label Mar 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use diagnostics_channel in layers #96

Use diagnostics_channel in layers #96

Qard commented Sep 17, 2020 •

edited

Loading

mcollina left a comment

mcollina Sep 18, 2020

Qard Sep 18, 2020

dougwilson Sep 21, 2020

wesleytodd commented Sep 18, 2020 •

edited

Loading

dougwilson commented Sep 18, 2020

wesleytodd commented Sep 18, 2020

Qard commented Sep 18, 2020

wesleytodd commented Sep 18, 2020

dougwilson commented Sep 18, 2020

Qard commented Sep 18, 2020 •

edited

Loading

dougwilson commented Sep 19, 2020 •

edited

Loading

Qard commented Sep 21, 2020

dougwilson commented Sep 21, 2020

dougwilson Sep 21, 2020

Qard Sep 21, 2020

dougwilson Sep 21, 2020

Qard Sep 21, 2020

dougwilson Sep 21, 2020 •

edited

Loading

Qard Sep 22, 2020

dougwilson Sep 22, 2020

Qard Sep 23, 2020

dougwilson Sep 23, 2020

Qard Sep 24, 2020

Qard commented Sep 21, 2020

dougwilson Sep 23, 2020

Qard Sep 25, 2020

dougwilson Sep 25, 2020

dougwilson Sep 25, 2020

dougwilson Sep 23, 2020

Qard Sep 25, 2020

wesleytodd commented Mar 16, 2024

Qard commented Mar 17, 2024

wesleytodd commented Mar 18, 2024

Use diagnostics_channel in layers #96

Are you sure you want to change the base?

Use diagnostics_channel in layers #96

Conversation

Qard commented Sep 17, 2020 • edited Loading

mcollina left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wesleytodd commented Sep 18, 2020 • edited Loading

dougwilson commented Sep 18, 2020

wesleytodd commented Sep 18, 2020

Qard commented Sep 18, 2020

wesleytodd commented Sep 18, 2020

dougwilson commented Sep 18, 2020

Qard commented Sep 18, 2020 • edited Loading

dougwilson commented Sep 19, 2020 • edited Loading

Qard commented Sep 21, 2020

dougwilson commented Sep 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dougwilson Sep 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Qard commented Sep 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wesleytodd commented Mar 16, 2024

Qard commented Mar 17, 2024

wesleytodd commented Mar 18, 2024

Qard commented Sep 17, 2020 •

edited

Loading

wesleytodd commented Sep 18, 2020 •

edited

Loading

Qard commented Sep 18, 2020 •

edited

Loading

dougwilson commented Sep 19, 2020 •

edited

Loading

dougwilson Sep 21, 2020 •

edited

Loading