KAUST Supercomputing Lab

Distributed Deep Learning on IBEX

Venue

KAUST Vis Lab Showcase, Building 1, Level 2

The KAUST Supercomputing Core Lab invites you to join the Distributed Deep Learning Workshop on IBEX, a hands-on training designed to help users efficiently scale AI workloads across multiple GPUs and compute nodes using IBEX’s high-performance computing environment.

This workshop provides a practical introduction to essential distributed training frameworks for accelerating training of models on IBEX GPUs using data and model parallelism. We will focus on PyTorch Distributed (DDP), DeepSpeed, Fully Sharded Data Parallel (FSDP) and NVIDIA NeMo demonstrates how to scale from one to many GPUs on a single and multiple nodes of IBEX.

Register here: Distributed Deep Learning Workshop on IBEX

Who should attend

Researchers working with ML and DL models
Data scientists and computational scientists
AI engineers working with GPU-intensive workloads
Anyone interested in scaling model training on HPC systems

Learning outcomes
After attending, participants will be able to:

Familiarize with distributed training frameworks (DDP, DeepSpeed, FSDP, NVIDIA NeMo)
Launch and manage multi-GPU and multi-node jobs using SLURM on IBEX
Through hands-on exercises understand the limitations of models and frameworks with respect to their scaling on multiple GPUs – “using more compute resources doesn’t alway mean faster model training”

Important Note on Workshop Scope

This workshop focuses on scaling and distributing existing deep learning workloads rather than teaching fundamental Python or neural network concepts. Attendees are expected to have prior familiarity with Python-based ML frameworks (e.g., PyTorch) and basic model training. The sessions will emphasize practical usage of distributed training frameworks and optimizing performance at scale on IBEX—not introductory model development.

Agenda

Day 1:

9:00 – 10:00 — Distributed Deep Learning Overview

10:00 - 10:15 — Coffee break

10:15 – 12:00 — Hands-On Session: PyTorch Distributed Data Parallel

12:00 – 1:00 — Lunch Break

1:00 – 1:45 — Hands-On Session: DeepSpeed

1:45 - 2:00 – Coffee break

2:00 – 3:00 — Hands-On Session: DeepSpeed

Day 2:

9:00 – 10:00 — Hands-On Session: Fully-Shared Data Parallel

10:00 - 10:15 — Coffee break

10:15 – 12:00 — Hands-On Session: Fully-Shared Data Parallel

12:00 – 1:00 — Lunch Break

1:00 – 2:15 — Hands-On Session: NVIDIA NeMo

Register here: Distributed Deep Learning Workshop on IBEX

For any questions, please contact: training@hpc.kaust.edu.sa
This opportunity is brought to you by the KAUST Core Labs – Supercomputing Core Lab.

Event Dates

2026-02-09 09:00 - 15:00

2026-02-10 09:00 - 15:00