-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add histogram support and TTFT histogram metric #396
Merged
yinggeh
merged 12 commits into
main
from
DLIS-7383-yinggeh-metrics-standardization-TTFT
Oct 23, 2024
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
b9231d8
Add histogram support and new TTFT metric
yinggeh 8c9fe0c
Minor update
yinggeh db990d3
Reuse and rename response_stats_index_
yinggeh 189cf64
Revert "Reuse and rename response_stats_index_"
yinggeh 51518d6
Update variable namings
yinggeh da632d1
Fix the logic that finds the first response.
yinggeh 7f0612c
Fix incorrect initialization of shared_ptr
yinggeh 72d99d7
Disable histograms by default
yinggeh b6b5af9
Minor fixes
yinggeh fb87d2a
Minor update
yinggeh a0ca111
Merge branch 'main' of https://github.com/triton-inference-server/cor…
yinggeh 525e238
Merge branch 'main' of https://github.com/triton-inference-server/cor…
yinggeh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an initial merge with a default of histograms being disabled - I think this is fine to go ahead with if we need to cherry-pick. However, please take note of the following:
I think this is relatively on the hot path, possibly impacting latency, compared to our other inference metrics (
TRITONBACKEND_ReportStatistics
) which are generally reported after response sending in backends (impacting throughput but not response latency).You can find some perf numbers of each prometheus-cpp metric type at the bottom of the README here: https://github.com/jupp0r/prometheus-cpp
One individual observation for a single metric and a small number of buckets may not be impactful for one request, but as we scale up high concurrency, more metrics, more buckets, etc - this could present a noticeable latency impact.
It would be good to do some light validation of overall latency before/after the feature via
genai-perf
. Especially for high concurrency and streaming many responses/tokens - as there can be some synchronization in interaction with the prometheus registry with many concurrent responses as well.It would probably be advantageous to do the actual prometheus registry interaction after sending the response if possible, such as by only doing the bare minimum of determining if we should report metrics (check if first response and record latency), then using that information to report the metric after initiating response send.