3. sbatch

SBATCH

To run a job with sbatch you will need to create an sbatch script.

This is comprised of 3 main parts which must be in the following order:

1. Indicate the interpreter your script uses

This should be your first line, this indicates the interpreter your script uses:

#!/bin/bash

2. #SBATCH lines

Slurm will read these to determine what resources your job is requesting. Slurm will read these in up until the first line without an #SBATCH in front, so these must come before the rest of your code.

#SBATCH lines typically look something like:

#SBATCH -n 4 This line indicates you would like to request 4 tasks, also called CPU cores.
#SBATCH -N 1 This line indicates you would like to request 1 compute node for these 4 cores to be spread across.
#SBATCH -t 0-00:30 This line indicates you would like your job to run for 30 minutes, which it will be killed after if it does not complete in those 30 minutes.
#SBATCH -C centos7 This line indicates that the nodes you request must be running the Centos7 operating system.
#SBATCH -p sched_mit_hill This line indicates which partition slurm will select the requested amount of nodes from.
#SBATCH --mem-per-cpu=4000 This line indicates your job will request 4G of memory per task/cpu core you requested.
#SBATCH -o output_%j.txt This line indicates that your job’s output will be directed to the file output_JOBID.txt
#SBATCH -e error_%j.txt This line indicates that your job’s error output will be directed to the file error_JOBID.txt
#SBATCH --mail-type=BEGIN,END This line indicates that your job will sendan email when it starts and when it ends
#SBATCH --mail-user=test@test.com This line indicates the email address you would like the start and end emails to be sent to.

3. The code you are actually running

This is the code that you want slurm to run on the compute nodes. This could be a line calling another already written piece of code, such as a python script called ‘test.py’ or it could be direct commands, such as echo "Hello World"

Example sbatch script

An example sbatch script may look something like this (note that %j will sub your job ID)

#!/bin/bash 
#SBATCH -n 4 #Request 4 tasks (cores)
#SBATCH -N 1 #Request 1 node
#SBATCH -t 0-00:30 #Request runtime of 30 minutes
#SBATCH -C centos7 #Request only Centos7 nodes
#SBATCH -p sched_mit_hill #Run on sched_engaging_default partition
#SBATCH --mem-per-cpu=4000 #Request 4G of memory per CPU
#SBATCH -o output_%j.txt #redirect output to output_JOBID.txt
#SBATCH -e error_%j.txt #redirect errors to error_JOBID.txt
#SBATCH --mail-type=BEGIN,END #Mail when job starts and ends
#SBATCH --mail-user=test@test.com #email recipient
echo "Hello World"

Important Notes

There are a few things to note when making batch scripts.

Time Limits

If no time limit is specified then the default time limit of the partition will be used instead (You can see this with “sinfo.”) If you request more time than the partition’s time limit your job will never run!

Node Operating Systems

Most legacy Engaging partitions have nodes with CentOS 7. If you need a CentOS 7 node or nodes, you should use “-C centos7” in your job submissions. New Engaging partitions are installed with Rocky 8 and usually contain “r8” somewhere in their name. These can also be explicitly requested by adding “-C rocky8” to submissions.

Number Of Jobs

Users are alloted up to 500 jobs at one time on the engaging cluster. If you need to run more than 500 simutaneous jobs, we recommend splitting them into multiple arrays.

More information on sbatch available from SchedMD:

https://slurm.schedmd.com/sbatch.html

If you have any questions about using sbatch, please email orcd-help-engaging@mit.edu