Fine tuning of Job Arrays

You may already know about job arrays. This SLURM feature allows you to submit and manage multiple similar jobs quickly and easily as this tip was hinting for some time ago...

Let's imagine you submit a job array of 100 jobs with the following command:

cdl:~> sbatch --array=1-100 my_slurm_script.sh
Submitted batch job 14688946

Did you know how flexible the definition of the array could be?

cdl:~> sbatch --array=1-10,88,97-100 my_slurm_script.sh
Submitted batch job 14688947

or

cdl:~> sbatch --array=1,6,87 my_slurm_script.sh
Submitted batch job 14688948

are possible commands to resubmit some failed jobs for example.

With a sligtly different syntax, the same convention holds when you cancel part of the array:

cdl:~> scancel 14688946_[34-66] 

or

cdl:~> scancel 14688947_[1-8,88,99] 

By default SLURM will execute as many jobs within the array as there are resources available. You can limit this behavior by using the '%' operator. For example:

cdl:~> sbatch --array=1-100%10 my_slurm_script.sh

will create a job array with size 100 but limits the number of simultaneously running tasks from this job to 10. Currently, on Shaheen, the maximum possible job array size is 800.