Overview of using Slurm: Difference between revisions
(Created page with "When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can ex...") |
mNo edit summary |
||
Line 29: | Line 29: | ||
# Number of nodes you need per job: | # Number of nodes you need per job: | ||
#SBATCH --nodes=1 | #SBATCH --nodes=1 | ||
# | |||
# Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb | |||
#SBATCH --mem=4gb | |||
# | # | ||
# Number of tasks (one for each GPU desired for use case) (example): | # Number of tasks (one for each GPU desired for use case) (example): |
Revision as of 01:51, 9 March 2023
When using Slurm, you will need to log into the Slurm head node (currently phoenix-01.gi.ucsc.edu, a one node cluster at the moment). Once you have ssh'd in there, you can execute slurm batch or interactive commands.
Submit a Slurm Batch Job
In order to submit a Slurm batch job list, you will need to create a directory that you will have read and write access to on all the nodes (which will often be a shared space). Let's say I have a batch named "experiment-1". I would create that directory in my groups area:
% mkdir /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1 % cd /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1
Then you will need to create your job submission batch file. It will look something like this. My file is called 'slurm-test.sh':
% vim slurm-test.sh
Then populate the file as necessary:
#!/bin/bash # Job name: #SBATCH --job-name=weiler_test # # Account: #SBATCH --account=weiler # # Partition - This is the queue it goes in: #SBATCH --partition=batch # # Where to send email (optional) #SBATCH --mail-user=weiler@ucsc.edu # # Number of nodes you need per job: #SBATCH --nodes=1 # # Memory needed for the jobs. Try very hard to make this accurate. DEFAULT = 4gb #SBATCH --mem=4gb # # Number of tasks (one for each GPU desired for use case) (example): #SBATCH --ntasks=1 # # Processors per task: # At least eight times the number of GPUs needed for nVidia RTX A5500 #SBATCH --cpus-per-task=1 # # Number of GPUs, this can be in the format of "gpu:[1-4]", or "gpu:K80:[1-4] with the type included (optional) #SBATCH --gres=gpu:1 # # Standard output and error log #SBATCH --output=serial_test_%j.log # # Wall clock limit in hrs:min:sec: #SBATCH --time=00:00:30 # ## Command(s) to run (example): pwd; hostname; date module load python echo "Running test script on a single CPU core" python /public/groups/clusteradmin/weiler/slurm-jobs/experiment-1/mytest.py date
Keep the "SBATCH" lines commented, the scheduler will read them anyway. If you don't need a particular option, just don't include it in the file.
To submit the batch job:
% sbatch slurm-test.sh Submitted batch job 7
The job(s) will then be scheduled. You can see the state of the queue as such:
% squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 7 batch weiler_t weiler R 0:07 1 phoenix-01
The job will output any STDOUT or STDERR in the directory you launched the job from. Other than that, it will do whatever the job does, even if there is no STDOUT.