Setting up job constraints
Ibex makes heavy use of features and the contraints flags to direct jobs onto the appropriate resources. Combined with GRES this is a powerful and flexible way to allow a set of defaults which do the Right Thing(tm) for people who just want to run basic tasks and don't care about architecture, extra memory, accelerators, etc. Below are some examples of how to request different resource configurations. The nodes are weighted such that the least valuable/rare node which can satisfy the request will be used. Be specific if you want a particular shaped resource.
What features are available?
To see a list of all node features, run:
ismailiy@cn509-02-l:~$ sinfo --partition=batch --format="%n %f" HOSTNAMES AVAIL_FEATURES cn603-06-r dragon,cpu_intel_gold_6148,skylake,intel,ibex2018,nogpu,nolmem,local_200G,local_400G,local_500G cn612-38-l mpi_intel,ibex2017,nolmem,nogpu,blade,ivybridge,cpu_intel_e5_2680_v2,ibid304,intel,local_200G,zram cn509-12-l dragon,cascadelake,cpu_intel_gold_6248,intel,ibex2019,nogpu,nolmem,local_200G,local_400G,local_500G,local_950G cn603-15-r dragon,cpu_intel_gold_6148,skylake,intel,ibex2018,nogpu,nolmem,local_200G,local_400G,local_500G cn603-17-l dragon,cpu_intel_gold_6148,skylake,intel,ibex2018,nogpu,nolmem,local_200G,local_400G,local_500G cn605-26-r dragon,cpu_intel_gold_6148,skylake,intel,ibex2018,nogpu,nolmem,local_200G,local_400G,local_500G cn612-10-l mpi_intel,ibex2017,nolmem,nogpu,blade,ivybridge,cpu_intel_e5_2680_v2,ibid304,intel,local_200G,zram cn612-15-l mpi_intel,ibex2017,nolmem,nogpu,blade,ivybridge,cpu_intel_e5_2670_v2,ibid304,intel,local_200G,local_400G,local_500G,zram cn612-19-r mpi_intel,ibex2017,nolmem,nogpu,blade,ivybridge,cpu_intel_e5_2670_v2,ibid304,intel,local_200G,local_400G,local_500G,zram cn612-25-r mpi_intel,ibex2017,nolmem,nogpu,blade,ivybridge,cpu_intel_e5_2680_v2,ibid304,intel,local_200G,zram cn612-31-l mpi_intel,ibex2017,nolmem,nogpu,blade,ivybridge,cpu_intel_e5_2680_v2,ibid304,intel,local_200G,zram gpu510-07 dragon,ibex2018,nolmem,cpu_intel_gold_6142,intel,ssh,gpu,intel_gpu,local_200G,local_400G,local_500G,gpu_rtx2080ti,rtx2080ti besest112-02 ibex2017,nogpu,cpu_intel_e7_2860,intel,largemem,local_10T,local_1T,local_200G,local_2T,local_3T,local_400G,local_500G,local_7T,pathogen,zram,ssh lm602-22 dragon,cpu_intel_xeon_gold_6246,cascadelake,intel,ibex2019,nogpu,largemem,local_200G,local_400G,local_500G,local_1T,local_2T,local_5T,nossh cn509-11-r dragon,cascadelake,cpu_intel_gold_6248,intel,ibex2019,nogpu,nolmem,local_200G,local_400G,local_500G,local_950G cn509-29-l dragon,cascadelake,cpu_intel_gold_6248,intel,ibex2019,nogpu,nolmem,local_200G,local_400G,local_500G,local_950G cn509-03-l dragon,cascadelake,cpu_intel_gold_6248,intel,ibex2019,nogpu,nolmem,local_200G,local_400G,local_500G,local_950G . .
Typically each node is tagged with :
- Architecture type
- Type of disk used for local scratch
- An optional (former) owner
- Presence of gpus
- The chassis to which it belongs.
For example, to see a summary of all available GPU GRES/features you can use the fact that all GPU nodes have gpu in the hostname and a gpu feature tag:
ismailiy@cn509-02-l:~$ sinfo -N --partition=batch --format="%G %f" | grep -v "nogpu" | grep "gpu" | sort | uniq -c 1 gpu:gtx1080ti:4(S:0-1) cpu_intel_e5_2699_v3,ibex2017,nolmem,mpi_intel,intel_gpu,ssh,gpu,gpu_gtx1080ti,gtx1080ti 4 gpu:gtx1080ti:4(S:0-1) ibex2017,nolmem,cpu_intel_e5_2699_v3,ssh,gpu,mpi_intel,intel_gpu,gpu_gtx1080ti,gtx1080ti,local_200G 1 gpu:gtx1080ti:4(S:0-1) ibex2017,nolmem,cpu_intel_e5_2699_v3,ssh,gpu,mpi_intel,intel_gpu,gpu_gtx1080ti,gtx1080ti,local_200G,local_400G,local_500G 1 gpu:gtx1080ti:4(S:0-1) ibex2017,nolmem,cpu_intel_e5_2699_v3,ssh,gpu,mpi_intel,intel_gpu,local_200G,gpu_gtx1080ti,gtx1080ti 1 gpu:gtx1080ti:4(S:0-1) ibex2017,nolmem,cpu_intel_e5_2699_v3,ssh,gpu,mpi_intel,intel_gpu,local_200G,local_400G,local_500G,gpu_gtx1080ti,gtx1080ti 4 gpu:gtx1080ti:8(S:0-1) dragon,ibex2018,nolmem,cpu_intel_gold_6142,intel,ssh,gpu,intel_gpu,local_200G,local_400G,local_500G,gpu_gtx1080ti,gtx1080ti 5 gpu:p100:4(S:0-1) ibex2017,nolmem,cpu_intel_e5_2699_v3,ssh,gpu,mpi_intel,intel_gpu,local_200G,gpu_p100,p100 1 gpu:p6000:2(S:0-1) ibex2017,nolmem,cpu_intel_e5_2699_v3,ssh,gpu,mpi_intel,intel_gpu,local_200G,local_400G,local_500G,p6000 4 gpu:rtx2080ti:8(S:0-1) dragon,ibex2018,nolmem,cpu_intel_gold_6142,intel,ssh,gpu,intel_gpu,local_200G,local_400G,local_500G,gpu_rtx2080ti,rtx2080ti 8 gpu:v100:4(S:0-1) dragon,ibex2018,nolmem,cpu_intel_gold_6142,intel,ssh,gpu,intel_gpu,local_200G,local_400G,local_500G,gpu_v100,v100 15 gpu:v100:8(S:0-1) dragon,ibex2018,nolmem,cpu_intel_platinum_8260,intel,gpu,intel_gpu,local_200G,local_400G,local_500G,gpu_v100,v100,nossh,ref_32T,gpu_ai
In this example there are currently 5 nodes with 4 x P100 GPUs for a total of 20 P100s available.
Specific CPU architecture
The Intel nodes perform much better for floating point operations while the AMD nodes are more efficient at integer operations. A common approach to optimizing your workload is to send integer or floating point work to the correct arch. Each node has a feature, either intel or amd, for it's architecture. To select one:
# Intel [hanksj@dm511-17:~]$ srun --pty --time=1:00 --constraint=intel bash -l [hanksj@dbn711-08-l:~]$ grep vendor /proc/cpuinfo | head -1 vendor_id : GenuineIntel [hanksj@dbn711-08-l:~]$ # AMD [hanksj@dm511-17:~]$ srun --pty --time=1:00 --constraint=amd bash -l [hanksj@db809-12-5:~]$ grep vendor /proc/cpuinfo | head -1 vendor_id : AuthenticAMD [hanksj@db809-12-5:~]$
Large memory
Large memory requests fall into two basic categories:
- A request for a large amount of memory per core.
- A request for all memory on a large memory node.
In turn:
Large amount of memory per core
This may or may not actually need a large memory node. And while it may be more efficient to pack these onto a large memory host, the overall best use of nodes depends on how many pending jobs actually require all of a large memory node. Typically its sufficient to just request a lot of memory and let the scheduler work it out:
ismailiy@dbn503-33-r:~$ srun --pty --time=1:00 --nodes=1 --ntasks-per-node=1 --mem=400g bash -l ismailiy@cn605-12-r:~$ free total used free shared buff/cache available Mem: 394874484 5049268 72010812 5659692 317814404 381736012 Swap: 0 0 0
All memory on a largemem node
Sometimes you need it all. This will get that for you.
ismailiy@dbn503-33-r:~$ srun --pty --time=1:00 --nodes=1 --ntasks-per-node=1 --exclusive --mem=400g bash -l
GPUs
There are two basic ways to ask for GPUs.
- You want a specific count of a specific model of GPU
- You want a specific count of any type of GPU
- You want a specific count of some subset of available types (e.g. any with > 8 GB of memory)
Specific model of GPU
# Request 2 P100 GPUs
ismailiy@cn509-02-l:~$ srun --pty --time=1:00 --gres=gpu:p100:2 bash -l srun: job 12476879 queued and waiting for resources srun: job 12476879 has been allocated resources ismailiy@dgpu501-26:~$ nvidia-smi Tue Nov 3 15:05:10 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.36.06 Driver Version: 450.36.06 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla P100-PCIE... On | 00000000:02:00.0 Off | 0 | | N/A 30C P0 25W / 250W | 0MiB / 16280MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla P100-PCIE... On | 00000000:03:00.0 Off | 0 | | N/A 31C P0 25W / 250W | 0MiB / 16280MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ...
Any GPU
# Request 1 GPU of any kind ismailiy@cn509-02-l:~$ srun --pty --time=1:00 --gres=gpu:1 bash -l srun: job 12476880 queued and waiting for resources srun: job 12476880 has been allocated resources ismailiy@gpu502-11:~$ nvidia-smi Tue Nov 3 15:08:29 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.36.06 Driver Version: 450.36.06 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce GTX 108... On | 00000000:B2:00.0 Off | N/A | | 26% 34C P8 9W / 250W | 1MiB / 11178MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
...
A subset of available GPU types
Every GPU node has both a GRES for the installed GPUs and a feature. To select from a subset simply specify the GRES as a GPU count and use a logical or (|) in a constraint:
# Request 1 GPU of type p100 OR v100 OR p6000 ismailiy@cn509-02-l:~$ srun --pty --gres=gpu:1 --constraint="[p100|v100|p6000]" --time=1:00:00 bash -l ismailiy@gpu502-11:~$ nvidia-smi Tue Nov 3 15:08:29 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.36.06 Driver Version: 450.36.06 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce GTX 108... On | 00000000:B2:00.0 Off | N/A | | 26% 34C P8 9W / 250W | 1MiB / 11178MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
...
Please also check out the job generator here.