GPUs on Monsoon

For some of your jobs on Monsoon, you might require the use of one or more Graphics Processing Units (GPUs) in order to accelerate your jobs. By using Slurm on Monsoon, you can easily request a GPU for your job.

Note: At the time of writing this article, Monsoon only contains NVIDIA GPUs.

Checking Available GPUs

To get a list of all GPUs on Monsoon along with what GPUs are available for use, you can use the gpu_status command:

$ gpu_status
Available GPUs: 10/24
k80: 7/12
p100: 0/4
v100: 3/4
a100: 0/4
Pending GPU jobs: 57
Running GPU jobs: 5

Note: If you are running deep learning software, we recommend using the a100 GPUs. If those are unavailable, try using either the p100 or the v100 GPUs instead.

Note: Some GPUs may be shown as available, but cannot be requested. This is because some nodes are dedicated to specific research groups.

Submitting a GPU Job

To quickly request a GPU of any model, you can use either the -G or ‐‐gpus= flags in either your srun or salloc commands:

# Both of these do the same thing!
$ srun -G 1 nvidia-smi
$ srun --gpus=1 nvidia-smi

$ salloc --gpus=1
$ srun nvidia-smi

Or, if you want to add it to your SBATCH script, you can add the following line where the rest of your SBATCH parameters are:

#SBATCH --gpus=1
nvidia-smi

If you want to request a specific GPU model, you can specify that in the same argument:

$ srun -G k80 nvidia-smi
$ srun -G k80:4 nvidia-smi    # requests four k80 GPUs

#SBATCH --gpus=a100
nvidia-smi                    # the program you want to run

Another way to request a specific GPU model would be to use the -C flag:

$ srun -C k80 -G 4 nvidia-smi

Relevant Flags

There are additional flags you can provide Slurm to fine-tune your jobs, but they may not be useful for everyone.

‐‐cpus-per-gpu=[int]: Specify how many CPUs to allocate for every GPU requested.
‐‐mem-per-gpu=[memory]: Specify how much memory to allocate for every GPU requested. This is most often used when using multiple GPUs.
‐‐ntasks-per-gpu=[int]: Specify how many tasks to run for every GPU requested. This option should be used by advanced users only.
‐‐gpus-per-task=[model]:[int]: Specify how many GPUs to allocate for every task being ran. This option should be used by advanced users only.
‐‐gpus-per-node=[model]:[int]: Specify how many GPUs to allocate for each node being requested. This option should be used by advanced users only.
‐‐gpus-per-socket=[model]:[int]: Specify how many GPUs to allocate for each CPU socket being used. This option should be used by advanced users only.

Checking GPU Usage

To ensure that you are getting the most out of your allocated GPU(s), checking the GPU usage is a useful tool. This can be monitored on our Monsoon Metrics (requires a connection to NAU WiFi or VPN) page or via CLI with the nvidia-smi dmon command while an SSH connection to the GPU node is active:

$ nvidia-smi dmon

Note: This command constantly monitors the GPU usage statistics, and will not stop until you enter the keyboard shortcut Control + C.

Using CUDA

Some of your jobs on Monsoon may require the use of NVIDIA’s Compute Unified Device Architecture (CUDA) toolkit. In order to use the CUDA toolkit libraries, you need to load the cuda module:

$ module load cuda