Slurm Tips for Toil
From UCSC Genomics Institute Computing Infrastructure Information
Here are some tips for running Toil workflows on the Phoenix Slurm cluster. Mostly you might want to run WDL workflows, but you can use some of these for other workflows like Cactus. You can also consult the Toil documentation on WDL workflows.
- Install Toil with WDL support with:
pipx install 'toil[wdl]'
To use a development version of Toil, you can install from source instead:
pipx 'toil[wdl]@git+https://github.com/DataBiosphere/toil.git'
Or for a particular branch:
pipx install 'toil[wdl]@git+https://github.com/DataBiosphere/toil.git@issues/123-abc'
If you don't have pipx, you would first need to:
python3 -m pip install --user pipx python3 -m pipx ensurepath
This may in turn need you to log out and back in.
- For Toil options, you will want --batchSystem slurm to make it use Slurm and --batchLogsDir ./logs (or some other location on a shared filesystem) for the Slurm logs to not get lost.
- You may be able to speed up your workflow with --caching true, to cache data on nodes to be shared among multiple simultaneous tasks.
- If using toil-wdl-runner, you might want to add --jobStore ./jobStore to make sure the job store is in a defined, shared location so that you can use --restart later.
- If using toil-wdl-runner, you will want to set the SINGULARITY_CACHEDIR and MINIWDL__SINGULARITY__IMAGE_CACHE environment variables for your workflow to locations on shared storage, and possibly to the default cache locations in your home directory. Otherwise Toil will set them to temporary per-workflow node-local directories for each node, and thus re-download images for each workflow run, and for each cluster node. To avoid this, you could, for example, before your run or in your ~/.bashrc you could:
export SINGULARITY_CACHEDIR=$HOME/.singularity/cache export MINIWDL__SINGULARITY__IMAGE_CACHE=$HOME/.cache/miniwdl
If you run a lot of workflows, or a workflow with a lot of containers, you will run out of space in your home directory. In that case, you can try using semi-persistent per-node storage for your image caches instead:
export SINGULARITY_CACHEDIR="/data/tmp/$(whoami)/cache/singularity" export MINIWDL__SINGULARITY__IMAGE_CACHE="/data/tmp/$(whoami)/cache/miniwdl"