[Feature] Metrics Endpoint #2638

eldhosemjoy · 2024-10-23T16:03:24Z

Motivation

Is there is any endpoint within the API server where we are able to pull the metics like Running Requests
Waiting Requests, Swapped Requests, GPU Cache Usage, CPU Cache Usage, Latency, Prompt Tokens, Generation Tokens?

This is to pull the metrics for the LMDeploy hosted inference engine in to prometheus.

Related resources

No response

Additional context

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Metrics Endpoint #2638

[Feature] Metrics Endpoint #2638

eldhosemjoy commented Oct 23, 2024

[Feature] Metrics Endpoint #2638

[Feature] Metrics Endpoint #2638

Comments

eldhosemjoy commented Oct 23, 2024

Motivation

Related resources

Additional context