KAUST Supercomputing Laboratory Newsletter 11th June 2020

In this newsletter:

  • Maintenance Session 22nd June
  • KAUST Reasearch Team Wins 2020 GSC Award at ISC20
  • RCAC Meeting
  • KAUST supercomputer Shaheen II joins the fight against COVID-19
  • Tip of the Week: Fine tuning of job arrays
  • Follow us on Twitter
  • Previous Announcements
  • Previous Tips

 

Maintence Session 22nd June

Shaheen will be unavailable to run jobs for approximately 1 hour from 09:00 on the 22nd June to allow for an urgent update of the Slurm job scheduler.

 

KAUST Research Team Wins 2020 GSC Award at ISC20

Congratulations to our colleagues at ECRC (Noha Alharthi, Rabab AlOmairy, Kadir Akbudak, Rui Chen, Hatem Ltaief, Hakan Bagci and David Keyes) for winning the 2020 GCS Award, as best paper award for the International Supercomputing Conference (ISC) 2020, entitled Solving Acoustic Boundary Integral Equations Using High Performance Tile Low-Rank LU Factorization.

Their project focused on optimizing mainly on Shaheen XC40 system, a class of solvers for data-sparse high-performance computing (HPC) for acoustic boundary integral equations.

More information is available here:

https://www.gauss-centre.eu/news/newsflashes/article/kaust-research-team-wins-2020-gcs-award/

 

RCAC Meeting

The project submission deadline for the next RCAC meeting is 30th June 2020. Please note that the RCAC meetings are held once per month. Projects received on or before the submission deadline will be included in the agenda for the subsequent RCAC meeting. The detailed procedures, updated templates and forms are available here: https://www.hpc.kaust.edu.sa/account-applications

 

KAUST supercomputer Shaheen II joins the fight against COVID-19

King Abdullah University of Science and Technology (KAUST) invites researchers from across the Kingdom to submit proposals for COVID-19-related research. Recognizing the urgency to address global challenges related to the COVID-19 pandemic through scientific discovery and innovation, the University’s Supercomputing Core Laboratory (KSL) is making computing resources—including the flagship Shaheen II supercomputer and its expert scientists—available to support research projects.

Topics may include but are not limited to: understanding the virus on a molecular level; understanding its fluid-dynamical transport; evaluating the repurposing of existing drugs; forecasting how the disease spreads; and finding ways to stop or slow down the pandemic.

Accepted proposals can access the following resources: (1) Shaheen II, a Cray XC-40 supercomputer based on Intel Haswell processors with nearly 200,000 compute cores tightly connected with Aries high-speed interconnect; (2) Ibex cluster, a high throughput computer system with about 500 computing nodes using Intel Skylake and Cascade Lake CPUs and Nvidia V100 GPUs; and (3) KSL staff scientists, who will provide support, training and consultancy to maximize impact. Through 30 June 2020, up to 15% of these resources will be reserved for fast-tracking competitive COVID-19 proposals through the KAUST Research Computing Allocation Committee.  Thereafter, such proposals remain welcome and will be considered in the standard process.

Applicants can apply for computing allocations using the COVID-19 Project Proposal form. Please submit the form to projects@hpc.kaust.edu.sa. Submitted proposals will be fast-tracked for processing.

Please contact help@hpc.kaust.edu.sa with any inquiries.

 

Tip of the Week: Fine tuning of job arrays

You may already know about job arrays. This SLURM feature allows you to submit and manage multiple similar jobs quickly and easily as this tip was hinting for some time ago...

Let's imagine you submit a job array of 100 jobs with the following command:

cdl:~> sbatch --array=1-100 my_slurm_script.sh
Submitted batch job 14688946

Did you know how flexible the definition of the array could be?

cdl:~> sbatch --array=1-10,88,97-100 my_slurm_script.sh
Submitted batch job 14688947

or

cdl:~> sbatch --array=1,6,87 my_slurm_script.sh
Submitted batch job 14688948

are possible commands to resubmit some failed jobs for example.

With a sligtly different syntax, the same convention holds when you cancel part of the array:

cdl:~> scancel 14688946_[34-66] 

or

cdl:~> scancel 14688947_[1-8,88,99] 

By default SLURM will execute as many jobs within the array as there are resources available. You can limit this behavior by using the '%' operator. For example:

cdl:~> sbatch --array=1-100%10 my_slurm_script.sh

will create a job array with size 100 but limits the number of simultaneously running tasks from this job to 10. Currently, on Shaheen, the maximum possible job array size is 800. 

 

Follow us on Twitter

Follow all the latest news on HPC within the Supercomputing Lab and at KAUST, on Twitter @KAUST_HPC.

Previous Announcements

http://www.hpc.kaust.edu.sa/announcements/

Previous Tips

http://www.hpc.kaust.edu.sa/tip/