Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add model_load_time metric #397

Merged
merged 17 commits into from
Oct 21, 2024
Merged

feat: Add model_load_time metric #397

merged 17 commits into from
Oct 21, 2024

Conversation

indrajit96
Copy link
Contributor

@indrajit96 indrajit96 commented Oct 14, 2024

What does the PR do?

Add new metric nv_load_time per model to metrics
Added load time gauge metric per model.
New metric added example

HELP nv_model_load_time Load Time per-model
TYPE nv_model_load_time gauge
nv_model_load_time{model="libtorch_float32_float32_float32",version="1"} 0.311423712

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

triton-inference-server/server#7697

Where should the reviewer start?

ReportModelLoadTime func use-age.

Test plan:

Added in server PR

src/metrics.cc Outdated Show resolved Hide resolved
src/metrics.cc Outdated Show resolved Hide resolved
@yinggeh
Copy link
Contributor

yinggeh commented Oct 14, 2024

Can you also add me to this PR?

src/model_repository_manager/model_lifecycle.cc Outdated Show resolved Hide resolved
Comment on lines 808 to 815
const uint64_t now_ns =
std::chrono::duration_cast<std::chrono::nanoseconds>(
std::chrono::steady_clock::now().time_since_epoch())
.count();
uint64_t time_to_load_ns = now_ns - loaded.second->load_start_ns_;
std::chrono::duration<double> time_to_load =
std::chrono::duration_cast<std::chrono::duration<double>>(
std::chrono::nanoseconds(time_to_load_ns));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you time the load time here and directly storing the load time in model info. You can still put the metric update logic here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason I added in the metric update in
ModelLifeCycle::OnLoadFinal() was because that is where we set the status to READY for the model
loaded.second->state_ = ModelReadyState::READY;
If I set the model load time here there is chance that ModelLifeCycle::OnLoadFinal() exits without loading the model and we might have a incorrect metric.

src/model_repository_manager/model_lifecycle.cc Outdated Show resolved Hide resolved
src/model_repository_manager/model_lifecycle.cc Outdated Show resolved Hide resolved
src/metrics.cc Outdated Show resolved Hide resolved
src/metrics.cc Outdated Show resolved Hide resolved
@yinggeh
Copy link
Contributor

yinggeh commented Oct 17, 2024

Can you build with metrics disabled flags and make sure it works?
-DTRITON_ENABLE_METRICS=OFF -DTRITON_ENABLE_METRICS_CPU=OFF -DTRITON_ENABLE_METRICS_GPU=OFF -DTRITON_ENABLE_STATS=OFF

yinggeh
yinggeh previously approved these changes Oct 18, 2024
Copy link
Contributor

@yinggeh yinggeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Make sure you addressed all the comments before merge.

kthui
kthui previously approved these changes Oct 18, 2024
Copy link
Contributor

@kthui kthui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor comment on model_info->load_start_ns_ lifecycle. LGTM overall!

src/model_repository_manager/model_lifecycle.cc Outdated Show resolved Hide resolved
@indrajit96 indrajit96 dismissed stale reviews from kthui and yinggeh via 121120d October 19, 2024 00:32
@indrajit96 indrajit96 merged commit 1962cce into main Oct 21, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants