Types of jobs

Going parallel

But, still, the real question is: How do you create a parallel job?

There are several ways to create parallel job, one whose tasks are run simultaneously, can be created:

  • by running a multi-process program (SPMD paradigm, e.g. with MPI)
  • by running a multithreaded program (shared memory paradigm, e.g. with OpenMP or pthreads)
  • by running several instances of a single-threaded program (so-called embarrassingly parallel paradigm)
  • by running one master program controling several slave programs (master/slave paradigm)

In the Slurm context, a task is to be understood as a process. So a multi-process program is made of several tasks. By contrast, a multithreaded program is composed of only one task, which uses several CPUs.

Tasks are requested/created with the --ntasks option, while CPUs, for the multithreaded programs, are requested with the --cpus-per-task option. Tasks cannot be split across several compute nodes, so requesting several CPUs with the --cpus-per-task option will ensure all CPUs are allocated on the same compute node. By contrast, requesting the same amount of CPUs with the --ntasks option may lead to several CPUs being allocated on several, distinct compute nodes.

Message passing example (MPI)

#!/bin/bash
#
#SBATCH --job-name=test_mpi
#SBATCH --output=res_mpi.txt
#SBATCH --partition=batch
#SBATCH --ntasks=4
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100

mpirun hello.mpi

Request four cores on the cluster for 10 minutes, using 100 MB of RAM per core. Assuming wiki_mpi_example.c​ was compiled with MPI support, mpirun will create four instances of it, on the nodes allocated by Slurm.

You can try the above example by downloading the example hello world program from Wikipedia (name it for instance wiki_mpi_example.c), and compiling it with

module load openmpi/3.0.0/gcc-6.4.0

mpicc wiki_mpi_example.c -o hello.mpi

The res_mpi.txt file should contain something like

We have 4 processes.

Process 1 reporting for duty.

Process 2 reporting for duty.

Process 3 reporting for duty.

Shared memory example (OpenMP)

#!/bin/bash
#
#SBATCH --job-name=test_omp
#SBATCH --output=res_omp.txt
#SBATCH --partition=batch
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
./hello.omp

The job it will be run in an allocation where four cores have been reserved on the same compute node.

You can try it by using the hello world program from LLNL (name it for instance wiki_omp_example.c) and compiling it with GNU compiler

module load gcc/6.4.0

gcc -fopenmp wiki_omp_example.c -o hello.omp

The res_omp.txt file should contain something like

Number of threads = 4

Hello World from thread = 0

Hello World from thread = 1

Hello World from thread = 2

Hello World from thread = 3

Embarrassingly parallel workload example

#!/bin/bash
#
#SBATCH --job-name=test_emb
#SBATCH --output=res_emb.txt
#SBATCH --partition=batch
#SBATCH --ntasks=4
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100

printenv SLURM_PROCID

In that configuration, the printenv command will be run four times, and each will have its environment variable SLURM_PROCID set to a distinct value. The output of res_emb.txt is shown below:

$ cat res_emb.txt

3

0

1

2

This setup is useful if the program is based on random draws (e.g. Monte-Carlo simulations): the application permitting, you can have four programs drawing 1000 samples and combine their output (with another program) to get the equivalent of drawing 4000 samples.

Another typical use of this setting is parameter sweep where the same computation is carried on by each program except that some high-level parameter has distinct values in each case. Examples include optimisation of an integer-valued parameter through range scanning. In the latter case, each instance of the program simply has to lookup the $SLURM_PROCID environment variable and decide, accordingly, what values of the parameter to test.

The same can be set up to process several data files for instance. Each instance of the program just has to decide which file to read based upon the value set in its $SLURM_PROCID environment variable.

 

Master/slave program example

#!/bin/bash
#
#SBATCH --job-name=test_ms
#SBATCH --output=res_ms.txt
#SBATCH --partition=batch
#SBATCH --ntasks=4
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100

srun --multi-prog multi.conf

With file multi.conf being, for example, as follows

0      echo     'I am the Master'
1-3    printenv SLURM_PROCID

The above instructs Slurm to create four tasks (or processes), one running my_master_program, and the other 3 running my_slave_program. This is typically used in a producer/consumer setup where one program (the master) create computing tasks for the other program (the slaves) to perform.

Upon completion of the above job, file res_ms.txt will contain

I am the Master
1
2
3

More submission script examples

Here are some quick sample submission scripts. For more detailed information, make sure to have a look at the Slurm FAQ and to follow our training sessions

 

Running Matlab using Parallel Toolbox (parfor) in SLURM

In order to use Matlab with parallel toolbox (parfor), SLURM and Matlab need to be configured properly. Let's say you want to run a job with 16 workers in your parpool ... then for SLURM you need exactly one node, with 1 task and 16 cpus per task, i.e., in your SLURM script specify:

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16

Then you'd need to make your parpool aware of how many workers are available when it is created (modify A below to suit your needs) and use that number to set the bounds on your parfor. So in your matlab code you'd do something like:

cores = str2num(getenv('SLURM_CPUS_PER_TASK'));
cluster = parcluster('local');
cluster.NumWorkers = cores - 1;
parpool(cluster, cluster.NumWorkers);
clear A
parfor i = 1:cores
   A(i) = i;
end
A