-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-8331 telemetry: Support client metrics dump without agent #14289
Conversation
Ticket title is 'Client side metrics/stats support for DAOS' |
Enable scenarios where client telemetry is collected and dumped to a CSV without agent config changes or involvement. Setting D_CLIENT_METRICS_DUMP_DIR in the client process environment will enable client telemetry dump to the specified directory even if the agent is not configured to export telemetry. Skip-NLT: true Features: telemetry Required-githooks: true Change-Id: I243d11a2e00059ef3115d392d63c523048477122 Signed-off-by: Michael MacDonald <[email protected]>
35ec346
to
7118526
Compare
Moving this out of draft for review. Note, NLT seems to be having issues with this PR. The actual tests pass, but the stage is marked as failed due to an error in calculating deltas. I wound up pushing with Given that NLT actually passed and the failure does not seem to be related to changes made in this PR, I don't think it's worth re-running again to try and get an NLT pass. |
/* Request that the agent adds our segment into the tree. */ | ||
rc = dc_mgmt_tm_register(NULL, dc_jobid, pid, &agent_uid); | ||
if (rc != 0) { | ||
DL_ERROR(rc, "client telemetry setup failed."); | ||
if (rc == -DER_UNINIT && d_isenv_def(DAOS_CLIENT_METRICS_DUMP_DIR)) { | ||
D_INFO("telemetry dump dir set -- proceeding without agent management.\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be appropriate to classify as a warning / D_WARN? Since the user has requested metrics to be retained but it can't be done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, but decided that it would be too noisy/alarming. The intent behind this PR is to allow a user to dump client telemetry without any involvement from the admin (i.e. no need to reconfigure the agent, etc).
Rather than requiring the user to set yet another environment variable to indicate that they want telemetry to work even if the agent isn't configured to manage it, I just modified the code to quietly handle this special case (-DER_UNINIT indicates that the agent hasn't initialized the telemetry library). To me, this follows the principle of least surprise (or annoyance, depending on your outlook).
The exact use cases here are still being hashed out, but in case it's not clear, my expectation is that if the user is asking to dump the client telemetry at program exit, then they probably don't care about whether or not the agent is configured to manage the telemetry. Agent management is only needed if the telemetry will be sampled in realtime while the application is still running.
Required-githooks: true Change-Id: I10d6df7d67399b564f964fca6f3da7af0698b18f
grumble ... Just merged master, re-running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Enable scenarios where client telemetry is collected and dumped
to a CSV without agent config changes or involvement.
Setting D_CLIENT_METRICS_DUMP_DIR in the client process
environment will enable client telemetry dump to the
specified directory even if the agent is not configured
to export telemetry.
Features: telemetry
Required-githooks: true
Change-Id: I243d11a2e00059ef3115d392d63c523048477122
Signed-off-by: Michael MacDonald [email protected]