Slurm Queues (Partitions) and Resource Management

From UCSC Genomics Institute Computing Infrastructure Information

Revision as of 22:55, 14 November 2023 by Weiler (talk | contribs)

Due to heterogeneous workloads and different batch requirements, we have implemented partitions in slurm, which are similar to queues.

Each partition has different default and maximum walltime limits (aka "runtime" limits). You will need to select a partition to launch your jobs in based on what kind of jobs they are and how long they are expected to run.

Partition Name Default Walltime Limit Maximum Walltime Limit Default Partition? Job Priority Maximum Nodes Utilized
short 10 minutes 1 hour Yes Normal All
medium 1 hour 12 hours No Normal 15
long 12 hours 7 days No Normal 10
high_priority 10 minutes 7 days No High All
gpu 10 minutes 7 days No Normal 6

If you do not specify a partition to run your job in, it will automatically be assigned the "short" partition by deafult. If you do not specify a walltime value in your job submission script, it will inherit the "Default Walltime Limit" of the partition it is assigned. Therefore, it is a very good idea to specify which partition your job will go in, and you should also specify a walltime limit, otherwise your jobs will inherit the default walltime limit in the chart above.

This all means that it is very important to TEST your jobs before running many of them! Submit one jobs and note how much resources it takes (RAM, CPU) and how long it takes to run. Then when you submit many of those jobs, you can correctly specify the number of CPU cores your job needs, how much RAM it needs (pad it by about 20% just in case), and how much time it needs to run (pad it by 40% to account for environmental variables like disk IO load and CPU context switching load).

You can test your jobs by running one job via srun with fairly high CPU, RAM and walltime limits (just so it isn't killed due to default limits), then noting how much in resources it consumed while running (after it finishes).

Example

seff 769059

Output

Job ID: 769059
Cluster: phoenix
User/Group: <user-name>/<group-name>
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 16
CPU Utilized: 00:00:01
CPU Efficiency: 0.11% of 00:15:28 core-walltime
Job Wall-clock time: 00:00:58
Memory Utilized: 4.79 MB
Memory Efficiency: 4.79% of 100.00 MB

So if I needed to run like 1000 of these jobs, and they were all similar, I would select the "short" partition, 1 CPU core, maybe specify 8MB RAM, and maybe 90 seconds walltime limit. Note how I padded the RAM and walltime a bit to account for unexpected variable cluster conditions.