Installation for the Slurm job scheduler on a CentOS 7 cluster.
- btools
SYS_UID_MIN
in/etc/login.defs
is < 980
- The head node is identified as
node01
- The compute nodes are identified as
node02
andnode03
-
On each node, run the installation script (this takes 20-30 minutes to complete)
$ ./install-slurm
-
On the head node, create a pseudorandom secret key for MUNGE to use on all of the compute nodes
$ ./create-munge-key
-
Copy the MUNGE secret key to all of the compute nodes
$ bpush /etc/munge/munge.key /etc/munge/munge.key
-
Change the owner of
/etc/munge/munge.key
to the munge user on all of the nodes
$ chown munge: /etc/munge/munge.key
$ bexec chown munge: /etc/munge/munge.key
-
Enable and start the MUNGE service on all of the nodes
$ systemctl enable munge
$ systemctl start munge
$ bexec systemctl enable munge
$ bexec systemctl start munge
-
From any computer, complete the Slurm configuration file generator; edit the fields according to the values below (fields not addressed below should be left as their default value or empty if there is no default value)
- ControlMachine:
node01
- NodeNames:
node[02-03]
- CPUs, Sockets, CoresPerSocket, and ThreadsPerCore: Values can be found by listing the CPU information on your machine with the
lscpu
command - StateSaveLocation:
/var/spool/slurm
- SlurmctldLogFile:
/var/log/slurm/slurmctld.log
- SlurmdLogFile:
/var/log/slurm/slurmd.log
- ControlMachine:
-
Click
submit
at the bottom of the page to generate the configuration file -
Copy the configuration file to the head node and save the file to
/etc/slurm/slurm.conf
-
Copy the configuration file to all of the compute nodes
$ bpush /etc/slurm/slurm.conf /etc/slurm/slurm.conf
-
Move the cgroup configuration file to
/etc/slurm/cgroup.conf
(overwrite the existing file created with the install script)
$ mv files/cgroup.conf /etc/slurm/cgroup.conf
-
Copy the cgroup configuration file to all of the compute nodes
$ bpush /etc/slurm/cgroup.conf /etc/slurm/cgroup.conf
-
Disable and stop the firewalld service on all of the compute nodes
$ bexec systemctl disable firewalld
$ bexec systemctl stop firewalld
-
On the head node, open port 6817 for Slurm
$ firewall-cmd --permanent --zone=public --add-port=6817/tcp
$ firewall-cmd --reload
-
On the head node, enable and start the slurmctld service
$ systemctl enable slurmctld
$ systemctl start slurmctld
-
On all of the compute nodes, enable and start the slurmd service
$ systemctl enable slurmd
$ systemctl start slurmd
Check the Slurm configuration on the head node
$ slurmd -C
Check the Slurm configuration on all of the compute nodes
$ slurmd -C
Confirm that all of the compute nodes are reporting to the head node
$ scontrol show nodes
Run an interactive job
$ srun --pty bash