How to use SLURM Workload Manager

The Bishop Cluster utilizes SLURM Workload manager. SLURM is a queue management system and stands for Simple Linux Utility for Resource Management.

Documentation

Documentation on SLURM usage and commands can be found at the SLURM site.



SLURM Command Cheat Sheet

Basic Slurm Commands

Show Available Job Queues: 

sinfo

Submit a Job:
sbatch myscript.sh

Submit a Job to a Specific Queue:
sbatch –partition=quickq myscript.sh

List all current jobs for a user:
squeue -u

List all running jobs for a user:
squeue -u -t RUNNING

List all pending jobs for a user:
squeue -u -t PENDING

List detailed information for a job (for troubleshooting):
scontrol show jobid -dd

List status info for a currently running job:
sstat –format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j –allsteps

To get statistics on completed jobs by jobID:
sacct -j –format=JobID,JobName,MaxRSS,Elapsed

Controlling Jobs

To cancel one job:
scancel

To cancel all the jobs:
scancel -u

To cancel all the pending jobs for a user:
scancel -t PENDING -u

To cancel one or more jobs by name:
scancel –name myJobName

To pause a particular job:
scontrol hold

To resume a particular job:
scontrol resume

To requeue (cancel and rerun) a particular job:
scontrol requeue

Sbatch Parameters (Full List Here)

#!/bin/bash 
#SBATCH -J jobname# Specify a Job Name
#SBATCH -n 1# Number of cores
#SBATCH -N 1# Number of nodes
#SBATCH –begin=3 PM# Specify time of day to run, now+ is delayed run
#SBATCH -t 0-00:00:00# Runtime in D-HH:MM:SS
#SBATCH -p queuename# Submit to specific queue
#SBATCH –mem=1# Total (Also can use –mem-per-node)
#SBATCH -o hostname_%j.out# File to which Output will be written
#SBATCH -e hostname_%j.err# File to which Errors will be written

Sbatch Example

#!/bin/bash

#SBATCH -N 2

#SBATCH –begin=now+2 hours

#SBATCH -t 0-12:00:00

#SBATCH -p quickq

#SBATCH –mem-per-node=4

#SBATCH -o hostname_%j.out

#SBATCH -e hostname_%j.err

 

# 2 nodes

# Run in 2 hours

# 12 hours runtime

# Submit to quickq

# 4 GB Memory per node, Total 8 GB

# Output File

# Error File

module load matlab

matlab -nodisplay < matlab_test.m

Load the Matlab Module

Launch matlab script