Slurm Job Arrays
A slurm array batch job is similar to just running a ‘for’ loop over the sbatch [sbatch script], but instead of having unique job id’s for each job, they all have the same jobid with a predictable id as a suffix.
Example
Let’s say you have a python script like this (addone.py):
import sys
var = int(sys.argv[1]) + 1 # add 1 to a number I pass in as a command line argument
print(var)
When I run python addone.py 1
on the command line, I will get 2.
I usually submit this script with this sbatch command sbatch addone.sbatch using this script:
#!/bin/bash
#SBATCH -N1
#SBATCH -n 1
python addone.py 1
If I wanted to get the output of adding one to numbers 1-10, I could make the sbatch job and array job, and use each array jobs unique $SLURM_ARRAY_TASK_ID variable to fill in each one.
#!/bin/bash
#SBATCH -N1
#SBATCH -n 1
#SBATCH --array=1-10
python addone.py $SLURM_ARRAY_TASK_ID
then submit with just sbatch addone.sbatch
. This will spawn 10 jobs, each one will have a unique $SLURM_ARRAY_TASK_ID variable representing which array job they are (1-10).
What we have done is taken the iteration out of the python script and created a quasi parallel run.
This all being said, while it does work, you may also be able to utilize the python parallel package to run each iteration as a seperate task (usually one per cpu core). This would likely be cleaner as you would stay within the python environment to handle all the parralel runs of the same code as opposed to slurm. It would likely be more efficient as well.
If you have any questions about using slurm job arrays, please email orcd-help-engaging@mit.edu