Status: It seems to work, but a systemctl --user restart
might be needed at startup time (see Troubleshooting).
Running Nextflow on top of slurm-container-cluster also seems to work. (See the Nextflow pipeline example where slurm-container-cluster runs a two-node Slurm cluster on a laptop and a desktop).
Run a Slurm cluster in containers as a non-root user on multiple hosts, by making use of
- podman for running containers. (Replacing
podman
withdocker
might also work but it is untested) - norouter for communication
- sshocker for sharing a local folder to the remote computers (reverse sshfs)
- slurm-docker-cluster. The slurm-container-cluster project reuses the basic architecture of the slurm-docker-cluster project but introduces multi-host functionality with the help of norouter and sshocker. Another difference is that slurm-container-cluster uses Systemd instead of Docker Compose.
Each Slurm software component slurmd
, slurmdbd
, slurmctld
and mysql
runs in a separate container.
Multiple slurmd
containers may be used. The slurmd
containers act as "compute nodes" in Slurm so it makes sense to have a number of them. If you have ssh access to remote computers, you may run the slurmd compute node containers there too. See also the section Boot Fedora CoreOS in live mode from a USB stick) on how to boot up a computer in live mode to let it become a remote ssh-accessible computer.
- podman version >= 2.1.0
(Installing podman might require root permissions, otherwise no root permissions are needed)
Using remote computers is optional as everything can be run locally. If you want some remote computers to act as extra compute nodes they need to be accessible via ssh and need to have
- podman version >= 2.1.0
- sshfs
installed.
(Installing sshfs and podman might require root permissions, otherwise no root permissions are needed)
A tip: The Linux distribution Fedora CoreOS comes with both podman and sshfs pre-installed.
Systemd service | Description |
---|---|
slurm-computenode@.service | Template unit file for Slurm compute nodes running slurmd in the container localhost/slurm |
slurm-create-datadir.service | Creates some empty directories under ~/.config/slurm-podman/ that will be used by the other services |
slurm-install-norouter.service | Install the executable norouter to ~/.config/slurm-podman/install-norouter/norouter |
slurm-install-sshocker.service | Install the executable sshocker to ~/.config/slurm-podman/install-sshocker/sshocker |
slurm-mysql.service | Runs mysqld in the container localhost/mysql-with-norouter |
slurm-slurmctld.service | Runs slurmctld in the container localhost/slurm-with-norouter |
slurm-slurmdbd.service | Runs slurmdbd in the container localhost/slurm-with-norouter |
- Clone this Git repo
$ git clone URL
- cd into the Git repo directory
$ cd slurm-container-cluster
- Build or pull the container images
Build the container images:
podman build -t slurm-container-cluster .
podman build -t mysql-with-norouter container/mysql-with-norouter/
podman image tag localhost/slurm-container-cluster localhost/slurm-with-norouter
or pull the container images:
podman pull docker.io/eriksjolund/slurm-container-cluster:podman-v2.1.1-slurm-slurm-20-11-2-1-norouter-v0.6.1
podman pull docker.io/eriksjolund/mysql-with-norouter:mysql-5.7-norouter-v0.6.1
podman image tag docker.io/eriksjolund/slurm-container-cluster:podman-v2.1.1-slurm-slurm-20-11-2-1-norouter-v0.6.1 localhost/slurm-container-cluster
podman image tag docker.io/eriksjolund/mysql-with-norouter:mysql-5.7-norouter-v0.6.1 localhost/mysql-with-norouter
podman image tag localhost/slurm-container-cluster localhost/slurm-with-norouter
(the identifiers localhost/slurm-with-norouter and localhost/mysql-with-norouter are used in the systemd service files)
- Create an empty directory
mkdir ~/installation_files
installation_files_dir=~/installation_files
(The variable is just used to simplify the instructions in this README.md)
bash prepare-installation-files.sh $installation_files_dir
Add extra container images to the installation files. These container images can be run by podman in your sbatch scripts.
podman pull docker.io/library/alpine:3.12.1
bash add-extra-containerimage.sh $installation_files_dir docker.io/library/alpine:3.12.1
Before running the scripts local-install.sh and remote-install.sh you might
want to modify the configuration file $installation_files_dir/slurm/slurm.conf
.
(The default $installation_files_dir/slurm/slurm.conf
defines the cluster as having the compute nodes c1 and c2)
If you want to run any of the slurm-related containers on the local computer, then
- In the git repo directory run
bash ./local-install.sh $installation_files_dir
The script local-install.sh should only modify files and directories under these directories
- ~/.config/slurm-container-cluster (e.g. mysql datadir, Slurm shared jobdir, log files,
sshocker
exectutable andnorouter
executable) - ~/.local/share/containers/ (the default directory where Podman stores its images and containers)
- ~/.config/systemd/user (installing all the services slurm-*.service)
- For each remote computer, run
bash ./remote-install.sh $installation_files_dir remoteuser@remotehost
on the local computer. It is expected that SSH keys have been set up so thatssh remoteuser@remotehost
succeeds without having to type any password.
bash ./remote-install.sh $installation_files_dir remoteuser@remotehost
On the computer that you would like to have mysqld, slurmdbd and slurmctld running (i.e. most probably the local computer), run
systemctl --user enable --now slurm-mysql.service slurm-slurmdbd.service slurm-slurmctld.service
(Advanced tip: If your local computer is not running Linux, you might be able to use one of
the remote computers instead and only use the local computer for running
sshocker
and norouter
. This is currently untested.)
The default $installation_files_dir/slurm/slurm.conf
defines the cluster as having the compute nodes c1 and c2.
To start the compute node c1 on localhost, run
systemctl --user enable --now slurm-computenode@1.service
To start the compute node c2, run
systemctl --user enable --now slurm-computenode@2.service
They can both be running on the same computer but also on different computers. Run the command on the computer where you would like to have the Slurm computenode running.
In case you have
- mysqld, slurmdbd, slurmctld and c1 running on localhost
- and c2 running on a remote computer accessible with remoteuser@192.0.2.10
you could just copy
-
cp ./norouter.yaml ~
-
start norouter with
norouter ~/norouter.yaml
otherwise you need to modify the file ~/norouter.yaml to match your setup.
Start sshocker to share ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared with remote computers
sshocker is used for having a local directory accessible on the remote computers.
Assuming the remote computer has the IP address 192.0.2.10 and the user is remoteuser. (Using a hostname instead of IP address is also possible). To make it easier to copy-paste from this documentation, let us set two shell variables
user=remoteuser
host=192.0.2.10
Share the local directory ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared
~/.config/install-sshocker/sshocker -v ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared:/home/$user/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared $user@$host
(The command is not returning)
Now both the local ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared and the remote ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared should contain the same files.
If you have other remote computers, you need to run sshocker commands for them as well.
Register the cluster
podman exec -it slurmctld bash -c "sacctmgr --immediate add cluster name=linux"
Show cluster status
podman exec -it slurmctld bash -c "sinfo"
Create a shell script in the directory ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared
vim ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared/test.sh
with this content
#!/bin/sh
echo -n "hostname : "
hostname
sleep 10
and make it executable
chmod 755 ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared/test.sh
Submit a compute job
podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"
Example session:
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && ls -l test.sh"
-rwxr-xr-x 1 root root 53 Nov 14 13:42 test.sh
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && cat test.sh"
#!/bin/sh
echo -n "hostname : "
hostname
sleep 10
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"
Submitted batch job 24
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"
Submitted batch job 25
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"
Submitted batch job 26
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"
Submitted batch job 27
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"
Submitted batch job 28
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"
Submitted batch job 29
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"
Submitted batch job 30
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./test.sh"
Submitted batch job 31
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && squeue"
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
26 normal test.sh root PD 0:00 1 (Resources)
27 normal test.sh root PD 0:00 1 (Priority)
28 normal test.sh root PD 0:00 1 (Priority)
29 normal test.sh root PD 0:00 1 (Priority)
30 normal test.sh root PD 0:00 1 (Priority)
31 normal test.sh root PD 0:00 1 (Priority)
24 normal test.sh root R 0:08 1 c1
25 normal test.sh root R 0:08 1 c2
user@laptop:~$
When the jobs have finished, run
user@laptop:~$ ls -l ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared/slurm-*.out
slurm-24.out
slurm-25.out
slurm-26.out
slurm-27.out
slurm-28.out
slurm-29.out
slurm-30.out
slurm-31.out
user@laptop:~$ cat _~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared/slurm-*.out
hostname : c1
hostname : c2
hostname : c1
hostname : c2
hostname : c1
hostname : c1
hostname : c2
hostname : c1
user@laptop:~$
Here is an example of how to to run a container with podman. The container docker.io/library/alpine:3.12.1 was previously added to the installation files with the script add-extra-containerimage.sh)
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && cat podman-example.sh"
#!/bin/sh
podman run --user 0 --cgroups disabled --runtime crun --volume /data:/data:rw --events-backend=file --rm docker.io/library/alpine:3.12.1 cat /etc/os-release
user@laptop:~$ podman exec -it slurmctld bash -c "cd /data/sshocker_shared && sbatch ./podman-example.sh"
Submitted batch job 32
When the job has finished, run
user@laptop:~$ ls -l ~/.config/slurm-container-cluster/slurm_jobdir/sshocker_shared/slurm-32.out
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.12.1
PRETTY_NAME="Alpine Linux v3.12"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"
Interesting logs can be seen by running
podman logs c1
podman logs slurmdbd
podman logs slurmctld
podman logs mysql
(The container must still be running in order for the podman logs
command to succeed).
At startup time there might be a few warnings for just a short while:
me@laptop:~$ ~/.config/slurm-container-cluster/install-norouter/norouter ~/norouter.yaml
laptop: INFO[0000] Ready: 127.0.29.100
laptop: INFO[0000] Ready: 127.0.29.3
laptop: INFO[0000] Ready: 127.0.30.1
laptop: INFO[0000] Ready: 127.0.29.2
laptop: INFO[0000] Ready: 127.0.30.2
laptop: WARN[0002] stderr[slurmctld-norouter(127.0.29.3)]: slurmctld: time="2020-12-05T09:48:29Z" level=error msg="failed to dial to \"127.0.0.1:7817\" (\"tcp\")" error="dial tcp 127.0.0.1:7817: connect: connection refused"
laptop: WARN[0002] stderr[slurmctld-norouter(127.0.29.3)]: slurmctld: time="2020-12-05T09:48:29Z" level=error msg="failed to dial to \"127.0.0.1:7817\" (\"tcp\")" error="dial tcp 127.0.0.1:7817: connect: connection refused"
laptop: WARN[0003] stderr[slurmctld-norouter(127.0.29.3)]: slurmctld: time="2020-12-05T09:48:30Z" level=error msg="failed to dial to \"127.0.0.1:7817\" (\"tcp\")" error="dial tcp 127.0.0.1:7817: connect: connection refused"
slurm-container-cluster seems to work though, so they can probably be ignored.
But the warning laptop: WARN[0004] error while handling L3 packet error="write |1: broken pipe" seems to be more severe.
laptop: WARN[0003] stderr[slurmdbd(127.0.29.2)]: d6ade94bd628: time="2020-12-05T08:50:33Z" level=error msg="failed to dial to \"127.0.0.1:7819\" (\"tcp\")" error="dial tcp 127.0.0.1:7819: connect: connection refused"
laptop: WARN[0003] stderr[slurmdbd(127.0.29.2)]: d6ade94bd628: time="2020-12-05T08:50:33Z" level=error msg="failed to dial to \"127.0.0.1:7819\" (\"tcp\")" error="dial tcp 127.0.0.1:7819: connect: connection refused"
laptop: WARN[0004] error while handling L3 packet error="write |1: broken pipe"
laptop: WARN[0004] error while handling L3 packet error="write |1: broken pipe"
laptop: WARN[0004] error while handling L3 packet error="write |1: broken pipe"
For those warnings, it seems that a restart of all the slurm-* services is needed.
If you experience problems, try this
-
Stop norouter (by pressing Ctrl-c)
-
Restart all services
systemctl --user restart slurm-mysql slurm-slurmdbd slurm-slurmctld slurm-create-datadir
systemctl --user restart slurm-computenode@1.service
systemctl --user restart slurm-computenode@2.service
(Note: the restart command should be run on the computer where the service was once enabled).
- Run podman logs
For the different containers
- mysql
- slurmdbd
- slurmctld
- c1
- c2
run podman logs containername
, for instance
$ podman logs c1
---> Starting the MUNGE Authentication service (munged) ...
-- Waiting for norouter to start. Sleeping 2 seconds ...
-- Waiting for norouter to start. Sleeping 2 seconds ...
-- Waiting for norouter to start. Sleeping 2 seconds ...
-- Waiting for norouter to start. Sleeping 2 seconds ...
Except for mysql, the containers should be all waiting for norouter to start.
- Start norouter
The Linux distribution Fedora CoreOS comes with both podman and sshfs pre-installed. If you have some extra computers that are not in use, you could boot them up with a Fedora CoreOS USB stick to get extra Slurm compute nodes.
Assuming your
- public ssh key is located in the file ~/.ssh/id_rsa.pub
- the command
podman
is installed - the architecture for the iso is x86_64
- your preferred choice of username is myuser
then run this command
bash create-fcos-iso-with-ssh-key.sh podman x86_64 stable ~/.ssh/id_rsa.pub myuser
to create the customized iso file. The path is written to stdout. The bash script and more documentation is available here
https://github.com/eriksjolund/create-fcos-iso-with-ssh-key
If you would like to have sudo permissions you need choose the username core.