KAUST Supercomputing Lab

Distributed Hyperparameter Optimization Workshop on IBEX

Venue

KAUST Vis Lab Showcase, Building 1, Level 2

The KAUST Supercomputing Core Lab invites you to join the Distributed Hyperparameter Optimization Workshop on IBEX, a hands-on training designed to help users efficiently scale machine learning experiments using Ray-Tune and PyTorch on IBEX’s high-performance computing environment.

This workshop focuses on running large-scale hyperparameter searches across multiple GPUs and compute nodes using Ray Tune integrated with SLURM, enabling systematic exploration of training configurations for deep learning models.

Register here: HPO on KSL platforms

Workshop Overview

Hyperparameter optimization (HPO), also known as hyperparameter tuning, is a compute-intensive and iterative step in building reliable machine learning and deep learning models. It typically requires retraining models many times with different parameter configurations, often leading to large numbers of independent SLURM jobs that must be manually orchestrated and monitored over extended periods. Poorly performing trials should ideally be stopped early, while promising configurations should be explored more deeply—making manual HPO workflows inefficient and error-prone.

In this workshop, we introduce Ray Tune, a scalable Python framework that consolidates and automates HPO experiments with built-in support for early stopping and intelligent exploration of the parameter space. Participants will learn how to run distributed HPO workloads on IBEX CPUs and GPUs, replacing ad-hoc job submission with a unified, automated experimentation pipeline.

Hands-on exercises will demonstrate how to:

Launch Ray Tune on IBEX using SLURM
Run large-scale HPO experiments for ML and DL workloads
Automatically stop bad trials early and steer searches toward promising regions
Use different search and scheduling algorithms (ASHA, PBT, Bayesian/Optuna)

Who Should Attend

Researchers training ML / DL models on GPUs
Data scientists running large experimental sweeps
AI engineers interested in scalable hyperparameter optimization
Users seeking practical experience running Ray workloads on HPC systems

Learning Outcomes

After attending, participants will be able to:

Design and run distributed HPO experiments with Ray Tune on IBEX
Replace manual multi-job HPO workflows with automated Ray-based pipelines
Apply early-stopping strategies to curtail poorly performing trials
Use multiple schedulers and search algorithms to efficiently explore parameter spaces
Launch and manage multi-GPU and multi-node HPO jobs using SLURM

Important Note on Workshop Scope

This workshop focuses on scaling and distributing existing deep learning workloads rather than teaching fundamental Python or neural network concepts. Attendees are expected to have prior familiarity with Python-based ML frameworks (e.g., PyTorch) and basic model training. The sessions will emphasize practical usage of distributed training frameworks and optimizing performance at scale on IBEX—not introductory model development.

Agenda

9:00 – 10:00 — Hyperparameter Optimization Overview

10:00 - 10:15 — Coffee break

10:15 – 12:00 — Hands-On Session: HPO with Ray-Tune; Understanding Search Algorithms and Schedulers

Event Dates

2026-02-11 09:00 - 12:00