Overview of using Slurm: Difference between revisions

Revision as of 01:51, 9 March 2023

When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.

Submit a Slurm Batch Job

In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:

% mkdir /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
% cd /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1

Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':

% vim slurm-test.sh

Then populate the file as necessary:

#!/bin/bash
# Job name:
#SBATCH --job-name=weiler_test
#
# Account:
#SBATCH --account=weiler
#
# Partition - This is the queue it goes in:
#SBATCH --partition=batch
#
# Where to send email (optional)
#SBATCH --mail-user=weiler@ucsc.edu
#
# Number of nodes you need per job:
#SBATCH --nodes=1
#
# Memory needed for the jobs.  Try very hard to make this accurate.  DEFAULT = 4gb
#SBATCH --mem=4gb
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task:
# At least eight times the number of GPUs needed for nVidia RTX A5500
#SBATCH --cpus-per-task=1
#
# Number of GPUs, this can be in the format of "gpu:[1-4]", or "gpu:K80:[1-4] with the type included (optional)
#SBATCH --gres=gpu:1
#
# Standard output and error log
#SBATCH --output=serial_test_%j.log
#
# Wall clock limit in hrs:min:sec:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
pwd; hostname; date
module load python
echo "Running test script on a single CPU core"
python /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1/mytest.py
date

Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.

To submit the batch job:

% sbatch slurm-test.sh
Submitted batch job 7

The job(s) will then be scheduled. You can see the state of the queue as such:

 % squeue
            JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                7     batch weiler_t   weiler  R       0:07      1 phoenix-01

The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.