Release gpud-v0.1.0 · leptonai/gpud

GPUd release notes (2024-10-27T09:38:10Z)

Welcome to this new release!

nits(server): debug level log for redundant register attempts by @gyuho in #126
fix(nvidia-smi/parse): do not parse remapped rows N/A by @gyuho in #128
feat(component/network): latency checks to global edge/DERP servers (using tailscale) by @gyuho in #125
fix(containerd): readable query failure error message (When CRI is not set up) by @gyuho in #129
fix(components): do not panic when there's no data collected yet by @gyuho in #130
feat(nvidia): exposing SM core and tensor core metrics in GPUd by @photoszzt in #132
fix(nvidia/query/metrics): remove duplicate metric register call by @gyuho in #133
feat(charts): add gpud run helm chart by @gyuho in #123
fix(infiniband): simplify ibstat existence when evaluating healthy by @gyuho in #124
feat(network/latency): track latency in metrics per region by @gyuho in #134
Update mothership endpoint by @cardyok in #82
fix(nvidia): use NVML + lspci to detect NVIDIA GPUs (without running nvidia-smi) by @gyuho in #127
fix(server): handle "components" URL query, return 404 not found on unknown component queries by @gyuho in #131
nits(nvidia/query): make detect logs debug level by @gyuho in #135
fix(status): fix divide by zero by @cardyok in #136
fix(nvidia/xid): do not error log when no xid happened yet by @gyuho in #138
fix(nvidia): persistence mode check based on NVML, do not rely on "nvidia-persistenced" binary by @gyuho in #137

Full Changelog: v0.0.5...v0.1.0