Running Jobs

How to use Slurm to submit a job

note

Please note that the job submission procedure is evolving and the directions below may change over time. Please reach out if you run into any issues.

Slurm Configuration

Version: v2 - enabled 2026-04-21

Guiding principles

Iterative This will be versioned and improved over time
Maximized Utilization We aim for maximum cluster utilization and high-impact outcomes, guided by the governance committee
Minimal Friction We want to minimize wait times and avoid unnecessary barriers
Feedback-Driven We know we won't get it right the first time—we need your input
User Experience We want everyone to have a good experience, though we understand not everyone will agree with every configuration choice

Current configuration

FairShare Scheduling

Scheduling priority is governed by your FairShare score, calculated by Slurm based on:

QoS levels
Resources requested
Time requested
Past utilization
Decay factors

QoS Recommendations

QoS	QoS Description	Default
high	Higher initial priority, faster FairShare decay	No
standard	Balanced (recommended)	Yes
low	Boosts FairShare by slowing decay	No
scavenger	Lowest initial priority, greatest FairShare boost (slowest decay)	No

Key Limits

Max wall time: 48 hours (all QoS levels)
Max resources: 90 nodes / 720 GPUs (all QoS levels)
Time field: Required on all submissions
Project/Account: Specify if you belong to multiple projects

Slurm Submission Flags

Specification	Option	Example	Example-Purpose	Required
Wall Clock Limit	--time=[hh:mm:ss]	--time=05:00:00	Set wall clock limit to 5 hours 00 min	Yes
Job Name	--job-name=[SomeText]	--job-name=myJob	Set the job name to "myJob"	No
Quality of Service	--qos=[QoS name]	--qos=standard	Choose the “standard” Qos - select from values in above table	No
Total nodes	--nodes=[#]	--nodes=1	Request 1 node	No
Total Task Count	--ntasks=[#]	--ntasks=2	Request 2 tasks total	No
CPUs per task	--cpus-per-task=[#]	--cpus-per-task=4	Request 4 CPUs per task	No
Total GPUs per node	--gres=gpu:[#]	--gres=gpu:4	Request 4 GPUs per node	No
Total GPUs per task	--gpus-per-task=[#]	--gpus-per-task=2	Request 2 GPUs per task	No
Total GPUs for job	--gpus=[#]	--gpus=10	Request 10 GPUs across the job	No
Tasks per Node	--ntasks-per-node=[#]	--ntasks-per-node=48	Request exactly (or max) of 48 tasks per node	No
Memory Per Node	--mem=value[K\|M\|G\|T]	--mem=360G	Request 360 GB per node	No
Combined stdout/stderr	--output=[OutputName].%j	--output=myJobOut.%j	Collect stdout/err in myJobOut.[JobID]	No

Examples

Job request for 1 node, 1 CPU, 1 GPU, and 10 minutes of runtime.

Create a script called job_hello_world.job:

#!/bin/bash  
#SBATCH --job-name=hello_world  
#SBATCH --nodes=1  
#SBATCH --ntasks=1  
#SBATCH --output=hello_%j.out  
#SBATCH --error=hello_%j.err  
#SBATCH --time=00:10:00  
#SBATCH --qos=standard  
#SBATCH --gres=gpu:1

srun sh -c 'echo "hello world ($(hostname)) ($XDG_RUNTIME_DIR) ($XDG_SESSION_ID) ($XDG_SESSION_TYPE) ($XDG_SESSION_CLASS)" | tee /scratch/user/$USER/hello_world_$(hostname)'

Run the job with:

sbatch ./job_hello_world.job

Job request for 100 GPUs, 150 GBs of RAM per node, 500 CPUs and 30 hours of runtime. There will be 500 instances of the srun command, each with 1 CPU.

Create a script called job_hello_world.job:

#!/bin/bash  
#SBATCH --job-name=hello_world  
#SBATCH --gpus=100  
#SBATCH --mem=150G  
#SBATCH --ntasks=500  
#SBATCH --output=hello_%j.out  
#SBATCH --error=hello_%j.err  
#SBATCH --time=30:00:00  
#SBATCH --qos=standard

srun sh -c 'echo "hello world ($(hostname)) ($XDG_RUNTIME_DIR) ($XDG_SESSION_ID) ($XDG_SESSION_TYPE) ($XDG_SESSION_CLASS)" | tee /scratch/user/$USER/hello_world_$(hostname)'

Run the job with:

sbatch ./job_hello_world.job

Job request for 100 GPUs, 150 GBs of RAM per node, 500 CPUs and 30 hours of runtime. There will be 500 instances of the srun command each with 4 CPUs.

Create a script called job_hello_world.job:

#!/bin/bash  
#SBATCH --job-name=hello_world  
#SBATCH --gpus=100  
#SBATCH --mem=150G  
#SBATCH --ntasks=500  
#SBATCH --cpus-per-task=4  
#SBATCH --output=hello_%j.out  
#SBATCH --error=hello_%j.err  
#SBATCH --time=30:00:00  
#SBATCH --qos=standard

srun sh -c 'echo "hello world ($(hostname)) ($XDG_RUNTIME_DIR) ($XDG_SESSION_ID) ($XDG_SESSION_TYPE) ($XDG_SESSION_CLASS)" | tee /scratch/user/$USER/hello_world_$(hostname)'

Run the job with:

sbatch ./job_hello_world.job

Glossary

FairShare Scheduling:
Scheduling priority is governed by a user’s FairShare score. The FairShare score is calculated by Slurm based on QoS levels, resources requested, time requested, past utilization, decay and other factors.
Using the “high” QoS will initially give you a higher priority but will cause a faster decay of your FairShare score. This will have the effect of delaying the start of future jobs. It is generally recommended to use the “standard” QoS. However, if your job can wait a bit, use of the “low” and “scavenger” QoS will boost your FairShare score by slowing its decay. This will enable future jobs to be scheduled more quickly.

All levels of QoS can potentially use all 90 nodes and all associated GPU’s (720 GPU)
Time is a required field on all submissions
Your default project is the first project to which you were assigned. However, if you are on multiple projects, you will need to specify which project (account) when scheduling a job.
All QoS levels have a max of 48 hours of wall clock time

Project:
The basic unit of system allocation

Slurm Account:
There is a 1 to 1 correspondence between a VISION project and a Slurm account

Slurm Association:
Assignment to project

All members of the project will be associated with the relevant Slurm account

Slurm Quality of Service (QoS):
The requested priority of a job submission

Slurm Configuration

Version: v2 - enabled 2026-04-21​

Guiding principles​

Current configuration​

FairShare Scheduling​

QoS Recommendations​

Key Limits​

Slurm Submission Flags​

Examples​

Glossary​

Links​

Version: v2 - enabled 2026-04-21

Guiding principles

Current configuration

FairShare Scheduling

QoS Recommendations

Key Limits

Slurm Submission Flags

Examples

Glossary

Links