Using Docker under Slurm

Sometimes it is convenient to ask Slurm to run your job in a docker container. This is just fine, however, you will need to fully test your job in a docker container beforehand (on mustard or emerald, for example) to see how much RAM and CPU resources it requires, so you can accurately describe in your slurm job submission file how many resources it needs.

Testing

You can run your container on mustard then look at 'top' to see how much RAM and CPU it needs.

You also will need to be aware that you will need to pull your docker image from a registry, like DockerHub or Quay. And you should also run you docker container with the '--rm' flag, so the container cleans itself up after running. So your workflow would look something like this:

1: Pull image from DockerHub
2: docker run --rm docker/welcome-to-docker

Optionally you can clean up your image as well, but only if you don't have many jobs using that image on the same node. For example, if I wanted to remove the image laballed "weiler/mytools":

$ docker image ls
REPOSITORY                          TAG                    IMAGE ID       CREATED         SIZE
weiler/mytools                     latest                 be6777ad00cf   19 hours ago    396MB
somedude/tools                     latest                 9b1d1f6fbf6f   3 weeks ago     607MB

$ docker image rm be6777ad00cf

Resource Limits

When running docker containers on Slurm, slurm cannot limit the resources that docker uses. Therefore, when you launch a container, you will need to know how much resources (RAM, CPU) it uses beforehand, determined by your testing. Then launch your job with the following --cpus and --memory parameters so docker itslef will limit what it uses:

docker run --rm --cpus=16 --memory=1024m docker/welcome-to-docker

The --memory argument is in megabytes (hence the 'm' at the end). So the above example will set a memory limit of 1GB.

Cleaning Scripts

We also have auto-cleaning scripts running that will delete any containers and images that were created/pulled more than 7 days ago. This includes the cluster nodes and also the phoenix head node itself. If you need a place to have your images/containers remain longer than that, please put them on mustard, emerald, crimson or razzmatazz.

Also, there are cleaning scripts in place that will destroy any running containers that have been running for over 7 days. We assume that such a container was not launched with --rm and needs to be cleaned up.