Slurm Tips for Toil

Here are some tips for running Toil workflows on the Phoenix Slurm cluster. Mostly you might want to run WDL workflows, but you can use some of these for other workflows like Cactus. You can also consult the Toil documentation on WDL workflows.

Install Toil with WDL support with:

pip3 install --upgrade toil[wdl]

To use a development version of Toil, you can install from source instead:

pip3 install git+https://github.com/DataBiosphere/toil.git#egg=toil[wdl]

Or for a particular branch:

pip3 install git+https://github.com/DataBiosphere/toil.git@issues/123-abc#egg=toil[wdl]

You will then need to make sure your ~/.local/bin directory is on your PATH. Open up your ~/.bashrc file and add:

export PATH=$PATH:$HOME/.local/bin

Then make sure to log out and back in again.

For Toil options, you will want --batchSystem slurm to make it use Slurm and --batchLogsDir ./logs (or some other location on a shared filesystem) for the Slurm logs to not get lost.

You may be able to speed up your workflow with --caching true, to cache data on nodes to be shared among multiple simultaneous tasks.

If using toil-wdl-runner, you might want to add --jobStore ./jobStore to make sure the job store is in a defined, shared location so that you can use --restart later.

If using toil-wdl-runner, you will want to set the SINGULARITY_CACHEDIR and MINIWDL__SINGULARITY__IMAGE_CACHE environment variables for your workflow to locations on shared storage, and possibly to the default cache locations in your home directory. Otherwise Toil will set them to node-local directories for each node, and thus re-download images for each workflow run, and for each cluster node. To avoid this, you could, for example, before your run or in your ~/.bashrc you could:

export SINGULARITY_CACHEDIR=$HOME/.singularity/cache
export MINIWDL__SINGULARITY__IMAGE_CACHE=$HOME/.cache/miniwdl