Skip to main content
KAUST Logo

Main navigation

  • Home
  • My KSL
    • KAUST SSO Login
  • Documentation
    • Apply for Access
    • Login to Shaheen
    • Extend project
    • Quickstart Guide
    • FAQs
    • Newsletters
  • Technical Guides
  • Events
    • Search
    • Calendar
  • Contact Us

User account menu

  • Log in

Distributed Hyperparameter Optimization Workshop on IBEX

Breadcrumb

  • Home
  • Distributed Hyperparameter Optimization Workshop on IBEX

Distributed Hyperparameter Optimization Workshop on IBEX

KAUST Vis Lab Showcase, Building 1, Level 2

The KAUST Supercomputing Core Lab invites you to join the Distributed Hyperparameter Optimization Workshop on IBEX, a hands-on training designed to help users efficiently scale machine learning experiments using Ray-Tune and PyTorch on IBEX’s high-performance computing environment.

This workshop focuses on running large-scale hyperparameter searches across multiple GPUs and compute nodes using Ray Tune integrated with SLURM, enabling systematic exploration of training configurations for deep learning models.

Register here: HPO on KSL platforms

Workshop Overview

Hyperparameter optimization (HPO), also known as hyperparameter tuning, is a compute-intensive and iterative step in building reliable machine learning and deep learning models. It typically requires retraining models many times with different parameter configurations, often leading to large numbers of independent SLURM jobs that must be manually orchestrated and monitored over extended periods. Poorly performing trials should ideally be stopped early, while promising configurations should be explored more deeply—making manual HPO workflows inefficient and error-prone.

In this workshop, we introduce Ray Tune, a scalable Python framework that consolidates and automates HPO experiments with built-in support for early stopping and intelligent exploration of the parameter space. Participants will learn how to run distributed HPO workloads on IBEX CPUs and GPUs, replacing ad-hoc job submission with a unified, automated experimentation pipeline.

Hands-on exercises will demonstrate how to:

  • Launch Ray Tune on IBEX using SLURM
  • Run large-scale HPO experiments for ML and DL workloads
  • Automatically stop bad trials early and steer searches toward promising regions
  • Use different search and scheduling algorithms (ASHA, PBT, Bayesian/Optuna)

Who Should Attend

  • Researchers training ML / DL models on GPUs
  • Data scientists running large experimental sweeps
  • AI engineers interested in scalable hyperparameter optimization
  • Users seeking practical experience running Ray workloads on HPC systems

Learning Outcomes

After attending, participants will be able to:

  • Design and run distributed HPO experiments with Ray Tune on IBEX
  • Replace manual multi-job HPO workflows with automated Ray-based pipelines
  • Apply early-stopping strategies to curtail poorly performing trials
  • Use multiple schedulers and search algorithms to efficiently explore parameter spaces
  • Launch and manage multi-GPU and multi-node HPO jobs using SLURM

Important Note on Workshop Scope

This workshop focuses on scaling and distributing existing deep learning workloads rather than teaching fundamental Python or neural network concepts. Attendees are expected to have prior familiarity with Python-based ML frameworks (e.g., PyTorch) and basic model training. The sessions will emphasize practical usage of distributed training frameworks and optimizing performance at scale on IBEX—not introductory model development.

Agenda

9:00 – 10:00 — Hyperparameter Optimization Overview

10:00 - 10:15 — Coffee break

10:15 – 12:00 — Hands-On Session: HPO with Ray-Tune; Understanding Search Algorithms and Schedulers

 

2026-02-11 09:00 - 12:00
Data Science

© 2026 King Abdullah University of Science and Technology. All rights reserved.