Running Multiple Parallel Jobs Simultaneously

On Shaheen, the compute nodes are exclusive, meaning that even when all the resources within a node are not utilized by a given job, another job will not have access to these resources. By default, multiple concurrent srun executions cannot share compute nodes under SLURM in the regular partition, so make sure that the total number of cores  required fit on the number of nodes requested. In the following example, a total of 9 nodes  are required. Notice the "&" at the end of each srun command.  Also the "wait" command at the end of the script is very important.  It makes sure that the batch job won't exit before all the simultaneous sruns are completed. 

#SBATCH -t 0:15:00

srun --hint=nomultithread -N 2 --ntasks=64 --ntasks-per-node=32 --ntasks-per-socket=16 ./my_exe_1 &
srun --hint=nomultithread –N 3 --ntasks=96 --ntasks-per-node=32 --ntasks-per-socket=16 ./my_exe_2 &
srun --hint=nomultithread -N 4 --ntasks=128 --ntasks-per-node=32 --ntasks-per-socket=16 ./my_exe_3 &

You can run sequentially multiple srun  by removing the “&”.