SLURM

Some useful SLURM commands

===========================================
Inside the jobscript for "sbatch":
---------------------------------------------------------------------------
To receive email notification about your job status:
#SBATCH --mail-type=ALL
#SBATCH --mail-user=YourEmailAddress
---------------------------------------------------------------------------
To use 72hours queue:
#SBATCH --partition=72hours
#SBATCH --qos=72hours
===========================================

Delay the start of a job in Slurm

It may sometimes be useful to submit a job and tell Slurm to defer scheduling until later. This is possible using the --begin option, which works with both sbatch and srun. The job will be submitted immediately, but only considered to run at the specified time in the future. Some examples below:

--begin=<time>

                 --begin=16:00
                 --begin=now+1hour
                 --begin=2016-06-30T12:34:00

Running Multiple Parallel Jobs Simultaneously

On Shaheen, the compute nodes are exclusive, meaning that even when all the resources within a node are not utilized by a given job, another job will not have access to these resources. By default, multiple concurrent srun executions cannot share compute nodes under SLURM in the regular partition, so make sure that the total number of cores  required fit on the number of nodes requested. In the following example, a total of 9 nodes  are required.

Why is my job not running?

When the estimated start time of your pending job is not available, you can get more details and reasons for your job not running:

By typing squeue --job <jobid >–l , you will get the following output along with the reason for your job not running.

           JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
           110000   workq 8-tuned_   user1  PENDING       0:00 3-00:00:00      1 (AssocGrpCPUMinutesLimit)

 

Job Arrays

SLURM allows you to submit and manage multiple similar jobs quickly and easily thanks to job arrays. Job arrays can be specified using two techniques:

  • in a batch directive,
    • #SBATCH --array=1-10
  • in the command line as
    • cdl:~>sbatch --array=0-10  my_slurm_script.sh

This will generate a job array containing 10 jobs. If the sbatch command responds "Submitted batch job 100 " then the environment variables will be set as follows: