Using the GPU nodes on Snowy

Snowy has a moderate number of nodes with a single Nvidia T4 GPU installed. This page contains instructions on how to use them.

Who has access?

Anyone who can run jobs on Snowy has access. Members of projects with core-hour allocations on Rackham can run jobs on Snowy in the bonus (lower priority) queue. Members of projects with core-hour allocations on Snowy can run jobs at ordinary priority. Members of projects that invested in the GPU resource (e.g. uppmax2020-2-2) will receive a higher priority in the queue.

How do we access them?

There are two ways to ask Slurm for GPU resources. The older "gres" method: "sbatch --gres=gpu:1" and the newer --gpu* options work, for example: "sbatch --gpus=1".
 
If you give Slurm either option, then the job will require a GPU node and will be able to use the GPU when it runs.
Here is an example jobscript for the Snowy hybrid nodes that uses the "--gres" option:
#!/bin/bash -l
#SBATCH -J jobname
#SBATCH -A snicxxxx-x-xx
#SBATCH -t 03-00:00:00
#SBATCH --exclusive
#SBATCH -p node
#SBATCH -N 1
#SBATCH -M snowy
#SBATCH --gres=gpu:1
#SBATCH --gpus-per-node=1
##for jobs shorter than 15 min (max 4 nodes):
#SBATCH --qos=short
One may ask for an interactive node this way:
salloc -A staff -N 1 -M snowy --gres=gpu:1 --gpus-per-node=1

How do we use CUDA and related software?

There is a system installation of CUDA v 11.
There are also two modules for CUDA 9 and 10 installed.
Example:
module use /sw/EasyBuild/snowy/modules/all/
module load intelcuda/2019b
will give you CUDA version 10.1.
Or if one doesn't want to use Intel, one may:
module use /sw/EasyBuild/snowy/modules/all
module load fosscuda/2019b
or
module load fosscuda/2018b

Where can I read more?

We have followed the current best practices in Slurm when
configuring the GPUs. Technically the GPUs are configured as what is
known as a "gres" (generic resource), which is then tracked using
"tres" (trackable resources). This means that most GPU-related options
you can find in Slurm's documentation is expected to work.

Slurm's GPU-related documentation is here:

  https://slurm.schedmd.com/gres.html#Running_Jobs

You can also search for "GPU" in the sbatch man-page on Snowy.