Skip to content

Haawron/SLURM_allocated_gres_visualizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SLURM_allocated_gres_visualizer

The app for visualizing allocated GPUs by SLURM

image

When you are using Slurm and you want to check which gpus are allocated, you must have done something like

  • ssh to each computing node and run nvidia-smi. Then, repeat it.
  • Run scontrol show job -d | grep GRES and roll your eyeballs.

both of which are very tedious. This project can solve this.

Requirements

Packages

  • matplotlib
  • sty
  • prometheus-client
  • requests
  • pandas
  • bs4

Slurm

  • Be sure that slurmctld(master) and slurmd(nodes) are active so that there are no problems for running scontrol show nodes or scontrol show job.
  • Be sure that AutoDetect=nvml for all computing nodes to avoid GPU index mismatch.
  • For all computing nodes, node-exporter are available at port 9100 and dcgm-exporter at 9400.

Installation

git clone https://github.com/Haawron/SLURM_allocated_gres_visualizer.git
cd SLURM_allocated_gres_visualizer
/usr/bin/python3 setup.py install  # be sure to be without conda

Usage

slurm-gres-viz

# GPU options
slurm-gres-viz -i  # stars are replaced to indices
slurm-gres-viz -gm -gu  # VRAM and GPU util
slurm-gres-viz -f  # Full information of GPUs
slurm-gres-viz -m  # mine: shows only my GPUs

# others
slurm-gres-viz -l 1  # looping every 1 second (same as nvidia-smi)

About

The app for visualizing allocated GPUs by SLURM

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages