Setting up job constraints

Ibex makes heavy use of features and the contraints flags to direct jobs onto the appropriate resources. Combined with GRES this is a powerful and flexible way to allow a set of defaults which do the Right Thing(tm) for people who just want to run basic tasks and don't care about architecture, extra memory, accelerators, etc. Below are some examples of how to request different resource configurations. The nodes are weighted such that the least valuable/rare node which can satisfy the request will be used. Be specific if you want a particular shaped resource.

What features are available?

To see a list of all node features, run:

ismailiy@cn509-02-l:~$ sinfo --partition=batch --format="%n %f"
HOSTNAMES AVAIL_FEATURES
cn603-06-r dragon,cpu_intel_gold_6148,skylake,intel,ibex2018,nogpu,nolmem,local_200G,local_400G,local_500G
cn612-38-l mpi_intel,ibex2017,nolmem,nogpu,blade,ivybridge,cpu_intel_e5_2680_v2,ibid304,intel,local_200G,zram
cn509-12-l dragon,cascadelake,cpu_intel_gold_6248,intel,ibex2019,nogpu,nolmem,local_200G,local_400G,local_500G,local_950G
cn603-15-r dragon,cpu_intel_gold_6148,skylake,intel,ibex2018,nogpu,nolmem,local_200G,local_400G,local_500G
cn603-17-l dragon,cpu_intel_gold_6148,skylake,intel,ibex2018,nogpu,nolmem,local_200G,local_400G,local_500G
cn605-26-r dragon,cpu_intel_gold_6148,skylake,intel,ibex2018,nogpu,nolmem,local_200G,local_400G,local_500G
cn612-10-l mpi_intel,ibex2017,nolmem,nogpu,blade,ivybridge,cpu_intel_e5_2680_v2,ibid304,intel,local_200G,zram
cn612-15-l mpi_intel,ibex2017,nolmem,nogpu,blade,ivybridge,cpu_intel_e5_2670_v2,ibid304,intel,local_200G,local_400G,local_500G,zram
cn612-19-r mpi_intel,ibex2017,nolmem,nogpu,blade,ivybridge,cpu_intel_e5_2670_v2,ibid304,intel,local_200G,local_400G,local_500G,zram
cn612-25-r mpi_intel,ibex2017,nolmem,nogpu,blade,ivybridge,cpu_intel_e5_2680_v2,ibid304,intel,local_200G,zram
cn612-31-l mpi_intel,ibex2017,nolmem,nogpu,blade,ivybridge,cpu_intel_e5_2680_v2,ibid304,intel,local_200G,zram
gpu510-07 dragon,ibex2018,nolmem,cpu_intel_gold_6142,intel,ssh,gpu,intel_gpu,local_200G,local_400G,local_500G,gpu_rtx2080ti,rtx2080ti
besest112-02 ibex2017,nogpu,cpu_intel_e7_2860,intel,largemem,local_10T,local_1T,local_200G,local_2T,local_3T,local_400G,local_500G,local_7T,pathogen,zram,ssh
lm602-22 dragon,cpu_intel_xeon_gold_6246,cascadelake,intel,ibex2019,nogpu,largemem,local_200G,local_400G,local_500G,local_1T,local_2T,local_5T,nossh
cn509-11-r dragon,cascadelake,cpu_intel_gold_6248,intel,ibex2019,nogpu,nolmem,local_200G,local_400G,local_500G,local_950G
cn509-29-l dragon,cascadelake,cpu_intel_gold_6248,intel,ibex2019,nogpu,nolmem,local_200G,local_400G,local_500G,local_950G
cn509-03-l dragon,cascadelake,cpu_intel_gold_6248,intel,ibex2019,nogpu,nolmem,local_200G,local_400G,local_500G,local_950G
.
.

Typically each node is tagged with :

  • Architecture type
  • Type of disk used for local scratch
  • An optional (former) owner
  • Presence of gpus
  • The chassis to which it belongs.

For example, to see a summary of all available GPU GRES/features you can use the fact that all GPU nodes have gpu in the hostname and a gpu feature tag:

ismailiy@cn509-02-l:~$ sinfo -N --partition=batch --format="%G %f"  | grep -v "nogpu" | grep  "gpu" | sort | uniq -c
      1 gpu:gtx1080ti:4(S:0-1) cpu_intel_e5_2699_v3,ibex2017,nolmem,mpi_intel,intel_gpu,ssh,gpu,gpu_gtx1080ti,gtx1080ti
      4 gpu:gtx1080ti:4(S:0-1) ibex2017,nolmem,cpu_intel_e5_2699_v3,ssh,gpu,mpi_intel,intel_gpu,gpu_gtx1080ti,gtx1080ti,local_200G
      1 gpu:gtx1080ti:4(S:0-1) ibex2017,nolmem,cpu_intel_e5_2699_v3,ssh,gpu,mpi_intel,intel_gpu,gpu_gtx1080ti,gtx1080ti,local_200G,local_400G,local_500G
      1 gpu:gtx1080ti:4(S:0-1) ibex2017,nolmem,cpu_intel_e5_2699_v3,ssh,gpu,mpi_intel,intel_gpu,local_200G,gpu_gtx1080ti,gtx1080ti
      1 gpu:gtx1080ti:4(S:0-1) ibex2017,nolmem,cpu_intel_e5_2699_v3,ssh,gpu,mpi_intel,intel_gpu,local_200G,local_400G,local_500G,gpu_gtx1080ti,gtx1080ti
      4 gpu:gtx1080ti:8(S:0-1) dragon,ibex2018,nolmem,cpu_intel_gold_6142,intel,ssh,gpu,intel_gpu,local_200G,local_400G,local_500G,gpu_gtx1080ti,gtx1080ti
      5 gpu:p100:4(S:0-1) ibex2017,nolmem,cpu_intel_e5_2699_v3,ssh,gpu,mpi_intel,intel_gpu,local_200G,gpu_p100,p100
      1 gpu:p6000:2(S:0-1) ibex2017,nolmem,cpu_intel_e5_2699_v3,ssh,gpu,mpi_intel,intel_gpu,local_200G,local_400G,local_500G,p6000
      4 gpu:rtx2080ti:8(S:0-1) dragon,ibex2018,nolmem,cpu_intel_gold_6142,intel,ssh,gpu,intel_gpu,local_200G,local_400G,local_500G,gpu_rtx2080ti,rtx2080ti
      8 gpu:v100:4(S:0-1) dragon,ibex2018,nolmem,cpu_intel_gold_6142,intel,ssh,gpu,intel_gpu,local_200G,local_400G,local_500G,gpu_v100,v100
     15 gpu:v100:8(S:0-1) dragon,ibex2018,nolmem,cpu_intel_platinum_8260,intel,gpu,intel_gpu,local_200G,local_400G,local_500G,gpu_v100,v100,nossh,ref_32T,gpu_ai

 

In this example there are currently 5 nodes with 4 x P100 GPUs for a total of 20 P100s available.

Specific CPU architecture

The Intel nodes perform much better for floating point operations while the AMD nodes are more efficient at integer operations. A common approach to optimizing your workload is to send integer or floating point work to the correct arch. Each node has a feature, either intel or amd, for it's architecture. To select one:

# Intel
[hanksj@dm511-17:~]$ srun --pty --time=1:00 --constraint=intel bash -l
[hanksj@dbn711-08-l:~]$ grep vendor /proc/cpuinfo | head -1
vendor_id	: GenuineIntel
[hanksj@dbn711-08-l:~]$ 

# AMD
[hanksj@dm511-17:~]$ srun --pty --time=1:00 --constraint=amd bash -l
[hanksj@db809-12-5:~]$ grep vendor /proc/cpuinfo | head -1
vendor_id	: AuthenticAMD
[hanksj@db809-12-5:~]$ 

Large memory

Large memory requests fall into two basic categories:

  1. A request for a large amount of memory per core.
  2. A request for all memory on a large memory node.

In turn:

Large amount of memory per core

This may or may not actually need a large memory node. And while it may be more efficient to pack these onto a large memory host, the overall best use of nodes depends on how many pending jobs actually require all of a large memory node. Typically its sufficient to just request a lot of memory and let the scheduler work it out:

ismailiy@dbn503-33-r:~$ srun --pty --time=1:00 --nodes=1 --ntasks-per-node=1 --mem=400g bash -l

ismailiy@cn605-12-r:~$ free
              total        used        free      shared  buff/cache   available
Mem:      394874484     5049268    72010812     5659692   317814404   381736012
Swap:             0           0           0

All memory on a largemem node

Sometimes you need it all. This will get that for you.

ismailiy@dbn503-33-r:~$ srun --pty --time=1:00 --nodes=1 --ntasks-per-node=1 --exclusive --mem=400g bash -l

GPUs

There are two basic ways to ask for GPUs.

  1. You want a specific count of a specific model of GPU
  2. You want a specific count of any type of GPU
  3. You want a specific count of some subset of available types (e.g. any with > 8 GB of memory)

Specific model of GPU

# Request 2 P100 GPUs
ismailiy@cn509-02-l:~$ srun --pty --time=1:00 --gres=gpu:p100:2 bash -l
srun: job 12476879 queued and waiting for resources
srun: job 12476879 has been allocated resources
ismailiy@dgpu501-26:~$ nvidia-smi
Tue Nov  3 15:05:10 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.36.06    Driver Version: 450.36.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:02:00.0 Off |                    0 |
| N/A   30C    P0    25W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  On   | 00000000:03:00.0 Off |                    0 |
| N/A   31C    P0    25W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
...

Any GPU

# Request 1 GPU of any kind
ismailiy@cn509-02-l:~$ srun --pty --time=1:00 --gres=gpu:1 bash -l
srun: job 12476880 queued and waiting for resources
srun: job 12476880 has been allocated resources
ismailiy@gpu502-11:~$ nvidia-smi
Tue Nov  3 15:08:29 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.36.06    Driver Version: 450.36.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  On   | 00000000:B2:00.0 Off |                  N/A |
| 26%   34C    P8     9W / 250W |      1MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
...

A subset of available GPU types

Every GPU node has both a GRES for the installed GPUs and a feature. To select from a subset simply specify the GRES as a GPU count and use a logical or (|) in a constraint:

# Request 1 GPU of type p100 OR v100 OR p6000

ismailiy@cn509-02-l:~$ srun --pty --gres=gpu:1 --constraint="[p100|v100|p6000]" --time=1:00:00 bash -l

ismailiy@gpu502-11:~$ nvidia-smi
Tue Nov  3 15:08:29 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.36.06    Driver Version: 450.36.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  On   | 00000000:B2:00.0 Off |                  N/A |
| 26%   34C    P8     9W / 250W |      1MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
...

Please also check out the job generator here.