Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics-exporter setup; How to go about it? #24

Open
suchisur opened this issue Mar 20, 2023 · 1 comment
Open

Metrics-exporter setup; How to go about it? #24

suchisur opened this issue Mar 20, 2023 · 1 comment
Labels
question Further information is requested

Comments

@suchisur
Copy link

Came across the metrics exporter, however am not able to set it up,
The errors are:

{"level":"info","ts":1679291005.7844253,"msg":"reading metrics file","metricsFile":""}
{"level":"error","ts":1679291005.7844558,"msg":"failed to read metrics file","error":"open : no such file or directory","stacktrace":"main.main\n\t/workspace/cmd/metricsexporter/metricsexporter.go:62\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}

Can someone please point me to set this up? We need to set up per pod GPU utilization metrics

@suchisur suchisur changed the title Metrics-exporter Metrics-exporter setup; How to go about it? Mar 20, 2023
@Telemaco019
Copy link
Member

Hi @suchisur, thanks for your interest in nos! The metrics exporter in nos does not provide GPU utilization metrics and is only used to optionally share basic telemetry data during nos installation as described in this documentation page.

For collecting GPU utilization metrics, I'd suggest using Prometheus with the NVIDIA DGCM Exporter. If you are already using the NVIDIA GPU Operator, you can easily set up the DCGM exporter as described here. Hope this helps!

@Telemaco019 Telemaco019 added the question Further information is requested label Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants