GPU Nodes
The IBEX cluster contains different architectures of GPUs like Turing, Pascal, Volta and Ampere. These GPUs are further described below for your source code compilation and job submission.
The IBEX cluster has 111 GPU compute nodes (618 GPU cards) and it’s summarized in Table 1. These various GPUs are accessed by the SLURM scheduling using the constraints "--gres=gpu:GPUTYPE:x”, where x is for number of GPUs.
For example, “--gres=gpu:gtx1080ti:4” allocates 4 GTX GPUs.
Table. List of GPU architectures in IBEX Cluster
Sl. No |
GPU |
GPU Architecture |
CUDA Capability |
Available GPU Cards Per Node |
Available Number of Nodes |
GPU Memory (Per Card) |
Usable Node Memory* |
CPU Type | CPU Core Count (Per Node) |
Constraint for SLURM Scheduling |
1. |
NVIDIA GeForce RTX 2080 Ti |
Turing
|
11.8 |
8 |
3 |
12GB |
350GB |
Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz |
32 |
"--gres=gpu:rtx2080ti:1" |
2. |
NVIDIA GeForce GTX 1080 Ti |
Pascal
|
11.8 |
4 or 8 |
12 |
12GB |
230GB |
Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz (add --constraint cpu_intel_e5_2699_v3) or Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (add --constraint cpu_intel_gold_6142) |
36 or 32 |
"--gres=gpu:gtx1080ti:1" |
3. |
NVIDIA Tesla P100 |
Pascal
|
11.8 |
4 |
5 |
16GB |
230GB |
Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz |
36 |
"--gres=gpu:p100:1" |
4. |
NVIDIA Quadro P6000 |
Pascal
|
11.8 |
2 |
2 |
22GB |
230GB |
Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz |
36 |
"--gres=gpu:p6000:1" |
5. |
NVIDIA Tesla V100 |
Volta
|
11.8 |
2 or 4 or 8 |
37 (1*2 and 6*4 and 30*8) |
32GB |
340GB or 712GB |
Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz (add --constraint cpu_intel_gold_6248) or Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (add --constraint cpu_intel_gold_6142) or Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz (add --constraint cpu_intel_platinum_8260) |
40 or 32 or 48 |
"--gres=gpu:v100:1" |
6. |
NVIDIA A100 |
Ampere
|
11.8 |
4 or 8 |
52 (44*4 and 8*8) |
80 GB |
500GB or 1TB |
AMD EPYC 7713P 64-Core Processor (add --constraint cpu_amd_epyc_7702) or AMD EPYC 7713 64-Core Processor (add --constraint cpu_amd_epyc_7713) |
64 or 128 |
"--gres=gpu:a100:1" "--reservation=A100" |
Note1: The allocation of CPU memory can be done with `--mem=###G` constraint in SLURM job scripts. The amount of memory depends on the job characteristization. A good starting place would be at least as much as the GPU memory they will use. For example: 2 x v100 GPUs would allocate at least `--mem=64G` for the CPUs.
Note2: The glogin node has a single NVIDIA Quadro K6000 (CC=3.5) GPU for compilation of the source code.
* The usable node memory represents the available memory for job execution.
More on Slurm Constraints:
"ref_32T" and "gpu_ai" are used to differentiate the newer generation of the V100 GPU nodes from the old ones.
The new nodes have 32TB of NVMes as local storage. And some ML reference DBs have been copied to those NVMes to enhance jobs performance instead of using the shared BeeGFS scratch.
Slurm Partition:
Continuous efforts has been made for fair share allocation of resources on Ibex, the following partitions has been implemented seamlessly to our users.
gpu_wide for jobs with 4+ gpus per node
gpu_wide24 wide jobs with time limit less than 24 hours
gpu4 for short GPU jobs (less than 4 hours)
Alternatively, send an email to ibex@hpc.kaust.edu.sa.