Rearrange metric enablement, so that model metric reporter can procee… #321

ClifHouck · 2024-01-19T19:21:15Z

…d properly.

Addresses triton-inference-server/server#6815

The fix is to enable GPU metrics (assuming they're enabled at compile time and by the user at run time) prior to calling MetricModelReporter::Create. If GPU metrics are not enabled then MetricModelReporter::GetMetricsLabels will not get/populate relevant GPU labels.

ClifHouck · 2024-01-22T18:09:48Z

To elaborate why this change fixes GPU metrics labels: enabling GPU metrics before initializing the server around line 2396: tc::Status status = lserver->Init() allows metric labels to be populated with GPU information.

…d properly

dyastremsky · 2024-02-21T00:49:55Z

Thank you for this PR!

These look good to me. Adding @rmccorm4 as a reviewer as well, since he is more familiar with these files.

Once Ryan is good with these changes, we'll run them through the CI and merge once all passes.

rmccorm4 · 2024-02-21T20:53:21Z

Hi @ClifHouck, thanks for this contribution!

While you have figured out a way to have the existing logic propagate the GPU labels to the generic per-model inference metrics - I wouldn't exactly say this is a bug at the moment.

Our per-model metrics are currently aggregated per-model, even if technically under the hood they are being tracked per-model-instance. By introducing these GPU labels for metrics other than the gpu mem/util metrics, it would start to expose the notion of per-model-instance metrics for the case of KIND_GPU models with multiple model instances.

To me I think there is some drawback to adding this support as-is, because it will introduce some inconsistency in how our metrics are reported and aggregated. With this change, KIND_GPU models will have per-model-instance metrics, but KIND_CPU/KIND_MODEL models will not. Similarly, I think this will also beg the question for models using multi-gpu (currently only supported via KIND_MODEL), why aren't the gpu_labels showing the multiple gpus being used for these model instances?

We have a ticket in our backlog (DLIS-3959) to increase the breakdown to per-model-instance metrics (generically for all model instances, irrespective of device type), but it hasn't been prioritized over other tasks yet. Not exposing the GPU labels for these inference metrics allows the metrics to be aggregated for consistency across all cases.

Can you elaborate more on your use case and needs, and how your proposed changes or our future changes for per-gpu or per-model-instance inference metrics would directly impact you?

Thanks,
Ryan

rmccorm4

Blocking accidental merge while we discuss the above comments.

ClifHouck · 2024-02-22T16:01:48Z

@rmccorm4 I have to disagree that this is not a bug. Given what you have said, there are at least two here:

It is not possible for certain metric information to be gathered or initialized during server initialization. Clearly MetricModelReporter expected metrics to be decisively enabled or disabled by the time that InferenceServer::Init is called. I think that's a reasonable thing to expect.
If MetricModelReporter shouldn't apply GPU labels to its metrics, then that code should be changed or removed.

I can add a commit to this PR which removes the gathering and applying of GPU UUID information to model metrics. That way we solve both issues outlined above.

rmccorm4 · 2024-02-22T20:14:20Z

(1) Clearly MetricModelReporter expected metrics to be decisively enabled or disabled by the time that InferenceServer::Init is called. I think that's a reasonable thing to expect.

lserver->Init() initializes most components of the server, several of which are the components that get queried to perioridcally update metrics. For example, tc::Metrics::StartPollingThreadSingleton(); starts a thread to poll metrics from the PinnedMemoryManager, which is initialized along with the server. So swapping these two operations does not currently make sense without greater refactoring.

I agree that this flow may be a bit unintuitive currently, since the MetricModelReporters are initialized along with the models+model_repository_manager. In fact, if you were do use the --model-control-mode explicit and dynamically load a model with KIND_GPU after the server has started up, then the gpu labels will actually get populated for these per-model metrics.

I agree this should be resolved one way or the other to be consistent, but I think it's something we should take care to change and we need to balance our current list of priorities. If this behavior is having a significant impact on some workflow or use case, please do let us know. But otherwise I think this is something for us to revisit when we have the bandwidth to do so.

(2) If MetricModelReporter shouldn't apply GPU labels to its metrics, then that code should be changed or removed.

I agree that this code should probably be commented out with a note that it could be re-applied if per-model-instance metrics are exposed for consistency.

rmccorm4 · 2024-08-01T19:11:57Z

Hi @ClifHouck, thanks for your patience on this and sorry for the long turnaround time. Upon further reflection, I think application of the GPU labels to the inference request metrics can be useful and provide some insights into request distribution, as well as even indirectly detecting faulty GPUs.

To me I think there is some drawback to adding this support as-is, because it will introduce some inconsistency in how our metrics are reported and aggregated. With this change, KIND_GPU models will have per-model-instance metrics, but KIND_CPU/KIND_MODEL models will not.

In hindsight, I realize that you can have multiple model instances per GPU, and so this would directly mean the metric is becoming a per-model-instance level metric. It just so happens to be in the case of 1 instance per GPU.

Similarly, I think this will also beg the question for models using multi-gpu (currently only supported via KIND_MODEL), why aren't the gpu_labels showing the multiple gpus being used for these model instances?

I think this is something we will just have to accept for now until we re-work or improve the instance group feature to better account and track models that use multiple GPUs per-instance.

Overall, I want the metric labels to be consistent one way or the other, and not dependent on timing or use of dynamic model loading.

I think we should go ahead with this change, but will need to some testing of a few things to make sure nothing breaks. It would be great if you're willing to help in adding some test cases, I can help point in the right direction or where to start. Otherwise, we can help when we get the cycles as well.

Off the top of my head, I'm interested in seeing the following checked via tests:

When a model is loaded at startup with KIND_GPU, that gpu_uuid label is present on the expected metrics for both inference request metrics and gpu memory/util metrics
When a model is loaded at startup with KIND_CPU and KIND_MODEL, that the gpu_uuid label is not present
The correctly labeled metric is updated per request. For example, 2-GPU system, 2 model instances, 1 per GPU, send 2 inference requests such that 1 request is scheduled to each GPU, assert that the request count for each per-gpu metric is 1, rather than either being 0 or 2.
There is no duplicate "unlabeled" metric in the KIND_GPU case such as:

nv_inference_count{gpu_uuid="GPU-1ccddeda-8caf-7d8e-cfe2-a63cddeaafd2",model="double",version="1"} 0
nv_inference_count{model="double",version="1"} 0

Lastly, I'm curious how this will impact existing workflows or dashboards that query the metrics today. Given a server today reporting request metrics with no GPU labels such as:

nv_inference_count{model="double",version="1"} 0

and then we add the GPU label after this PR on a 2-GPU system, so the metrics output becomes:

nv_inference_count{gpu_uuid="GPU-1ccddeda-8caf-7d8e-cfe2-a63cddeaafd2",model="double",version="1"} 0
nv_inference_count{gpu_uuid="GPU-1ccddeda-8caf-7d8e-cfe2-a63cddeaafd2",model="double",version="1"} 0

Will we break any existing promql queries or workflows looking at nv_inference_count?

rmccorm4 · 2024-08-01T19:13:14Z

CC @chriscarollo from your issue as this PR pertains to your question.

ClifHouck mentioned this pull request Jan 20, 2024

MetricModelReporter is not populating available GPU information to appropriate metric labels. triton-inference-server/server#6815

Open

ClifHouck force-pushed the clif/fix_enablement_of_metric_labels branch from f93cf3a to b0a970d Compare January 26, 2024 14:29

Rearrange metric enablement, so that model metric reporter can procee…

89a0126

…d properly

ClifHouck force-pushed the clif/fix_enablement_of_metric_labels branch from b0a970d to 89a0126 Compare January 29, 2024 14:34

ClifHouck mentioned this pull request Feb 20, 2024

Enhancement Request: Additional GPU Information in Prometheus Metrics triton-inference-server/server#6384

Open

dyastremsky requested a review from rmccorm4 February 21, 2024 00:48

dyastremsky self-assigned this Feb 21, 2024

dyastremsky approved these changes Feb 21, 2024

View reviewed changes

rmccorm4 requested changes Feb 21, 2024

View reviewed changes

ClifHouck mentioned this pull request Apr 18, 2024

Comment out gpu metric gathering code that cannot succeed #342

Closed

rmccorm4 mentioned this pull request Aug 1, 2024

nv_inference_count no longer includes gpu_uuid? triton-inference-server/server#7479

Open

rmccorm4 assigned rmccorm4 and unassigned dyastremsky Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rearrange metric enablement, so that model metric reporter can procee… #321

Rearrange metric enablement, so that model metric reporter can procee… #321

ClifHouck commented Jan 19, 2024

ClifHouck commented Jan 22, 2024

dyastremsky commented Feb 21, 2024 •

edited

Loading

rmccorm4 commented Feb 21, 2024 •

edited

Loading

rmccorm4 left a comment

ClifHouck commented Feb 22, 2024

rmccorm4 commented Feb 22, 2024 •

edited

Loading

rmccorm4 commented Aug 1, 2024

rmccorm4 commented Aug 1, 2024

Rearrange metric enablement, so that model metric reporter can procee… #321

Are you sure you want to change the base?

Rearrange metric enablement, so that model metric reporter can procee… #321

Conversation

ClifHouck commented Jan 19, 2024

ClifHouck commented Jan 22, 2024

dyastremsky commented Feb 21, 2024 • edited Loading

rmccorm4 commented Feb 21, 2024 • edited Loading

rmccorm4 left a comment

Choose a reason for hiding this comment

ClifHouck commented Feb 22, 2024

rmccorm4 commented Feb 22, 2024 • edited Loading

rmccorm4 commented Aug 1, 2024

rmccorm4 commented Aug 1, 2024

dyastremsky commented Feb 21, 2024 •

edited

Loading

rmccorm4 commented Feb 21, 2024 •

edited

Loading

rmccorm4 commented Feb 22, 2024 •

edited

Loading