Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Test and document histogram latency metrics #7694

Merged
merged 16 commits into from
Oct 23, 2024

Conversation

yinggeh
Copy link
Contributor

@yinggeh yinggeh commented Oct 12, 2024

What does the PR do?

The PR adds tests to histogram metrics and new nv_inference_first_response_histogram_ms.

  1. Verify this metric is only created in decoupled models.
  2. Tests --metrics-config histogram_latencies=<bool>.
  3. Tests data are correct in a ensemble decoupled model.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

  • test

Related PRs:

triton-inference-server/core#396

Where should the reviewer start?

Check the implementation PR first.

Test plan:

L0_metrics--base
L0_response_cache--base

  • CI Pipeline ID:
    19614087

Background

Standardizing Large Model Server Metrics in Kubernetes

@yinggeh yinggeh added the PR: test Adding missing tests or correcting existing test label Oct 12, 2024
@yinggeh yinggeh self-assigned this Oct 12, 2024
@rmccorm4 rmccorm4 changed the title test: Tests new histogram metric test: Test and document histogram latency metrics Oct 12, 2024
@rmccorm4 rmccorm4 added the PR: docs Documentation only changes label Oct 12, 2024
kthui
kthui previously approved these changes Oct 16, 2024
docs/user_guide/metrics.md Outdated Show resolved Hide resolved
@yinggeh yinggeh force-pushed the DLIS-7383-yinggeh-metrics-standardization-TTFT branch from d663da5 to e084067 Compare October 16, 2024 23:03
@yinggeh yinggeh requested a review from kthui October 18, 2024 17:45
qa/L0_metrics/ensemble_decoupled/ensemble/config.pbtxt Outdated Show resolved Hide resolved
qa/L0_metrics/histogram_metrics_test.py Outdated Show resolved Hide resolved
qa/L0_metrics/histogram_metrics_test.py Outdated Show resolved Hide resolved
qa/L0_metrics/test.sh Outdated Show resolved Hide resolved
qa/L0_metrics/histogram_metrics_test.py Outdated Show resolved Hide resolved
qa/L0_metrics/histogram_metrics_test.py Outdated Show resolved Hide resolved
@yinggeh yinggeh requested a review from kthui October 19, 2024 00:52
Copy link
Contributor

@kthui kthui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to rebase on main that has the compute capability fix merged.

… DLIS-7383-yinggeh-metrics-standardization-TTFT
@yinggeh yinggeh force-pushed the DLIS-7383-yinggeh-metrics-standardization-TTFT branch from 9f5b2ab to aa9ddc7 Compare October 21, 2024 21:30
@yinggeh yinggeh requested a review from kthui October 21, 2024 23:20
Copy link
Contributor

@kthui kthui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work adding a more comprehensive test!

Only two things left:

  1. Verify the number of response(s) returned on each inference matches the number expected.
  2. Verify the histogram bucket key and value pairs after each inference.

qa/L0_metrics/histogram_metrics_test.py Outdated Show resolved Hide resolved
qa/L0_metrics/histogram_metrics_test.py Outdated Show resolved Hide resolved
qa/L0_metrics/histogram_metrics_test.py Outdated Show resolved Hide resolved
qa/L0_metrics/histogram_metrics_test.py Outdated Show resolved Hide resolved
qa/L0_metrics/histogram_metrics_test.py Outdated Show resolved Hide resolved
@yinggeh yinggeh requested a review from kthui October 22, 2024 16:54
Comment on lines +101 to +106
# Histograms
def test_inf_histograms_decoupled_exist(self):
metrics = self._get_metrics()
for metric in INF_HISTOGRAM_DECOUPLED_PATTERNS:
for suffix in ["_count", "_sum", ""]:
self.assertIn(metric + suffix, metrics)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work adding checks that the count, sum and bucket metrics exist!

This does not check if the value on each bucket is correct, but you mentioned there is an existing test that will verify the value is correct, so you are not verifying the value here because the existing test will verify the Prometheus histogram metrics is functioning correctly and the tests on histogram_metrics_test.py is verifying the numbers provided to Prometheus is correct via count and sum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this test only checks if specific histograms exist without updating the metrics, all values should be 0. For tests regarding Prometheus metrics APIs and functionalities, please refer to
https://github.com/triton-inference-server/core/blob/main/src/test/metrics_api_test.cc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Prometheus histogram buckets are tested in metrics_api_test.cc::HistogramAPIHelper

Just a nit for clarification: The Metrics API tests (ex: TRITONSERVER_MetricObserve) aren't going through the same code path that the built-in latency metrics are going through. So although there are custom metrics tests testing histograms, they aren't necessarily testing the built-in histogram latency metrics added here.

The built-in metrics actually just use prometheus APIs directly and some C++ helper functions around them today. Ideally, we would unify these to all use the TRITONSERVER_Metric layer in the same way for both built-in metrics and custom metrics for easier test coverage and maintenance in the future.


All that being said, I think the current tests included here with the checks around the _sum value are fine for now for this PR. When adding other histogram latency metrics, I think it would be a good idea to either (a) add some bucket-related tests to these python unit tests or (b) attempt to unify the internal metrics to use the same custom metrics APIs or internal C++ classes around them for better re-use of the metrics_api_test.cc tests.

kthui
kthui previously approved these changes Oct 22, 2024
cp ../python_models/${decoupled_model_name}/model.py ${MODELDIR}/${decoupled_model_name}/1/
cp ../python_models/${decoupled_model_name}/config.pbtxt ${MODELDIR}/${decoupled_model_name}/

SERVER_ARGS="${BASE_SERVER_ARGS} --load-model=${decoupled_model_name}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit for future follow-up (not this PR), a lot of this could probably be condensed with a for loop.

Copy link
Collaborator

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Only a minor comment on metrics.md for this PR, but LGTM otherwise.

@yinggeh yinggeh merged commit ff1c674 into main Oct 23, 2024
3 checks passed
@yinggeh yinggeh deleted the DLIS-7383-yinggeh-metrics-standardization-TTFT branch October 23, 2024 10:57
@yinggeh yinggeh restored the DLIS-7383-yinggeh-metrics-standardization-TTFT branch October 23, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: docs Documentation only changes PR: test Adding missing tests or correcting existing test
Development

Successfully merging this pull request may close these issues.

3 participants