KAUST Supercomputing Laboratory Newsletter 4th June 2020

In this newsletter:

  • Application License Server Maintenance by IT
  • RCAC Meeting
  • Sixth KAUST-ANSYS Workshop
  • KAUST supercomputer Shaheen II joins the fight against COVID-19
  • Tip of the Week: Why is my job still pending in the queue?
  • Follow us on Twitter
  • Previous Announcements
  • Previous Tips

 

Application License Server Maintenance by IT on Thursday, 4th June 2020, 8.00 PM to 4:00 AM 5th June

Due to a scheduled maintenance of the Application License Server by IT on Thursday, 4th June 2020, 8.00 PM to 4:00 AM next day, access to the below applications will be impacted on Shaheen and Neser: 

Ansys, AtomistixToolKit (ATK), Converge, Eclipse, Intel Compilers, Material Studio, Mathematica, MATLAB, Tecplot and Totalview.

During these maintenance windows, you may face issues with Intel at compilation and error with the application mentioned at runtime.

 

RCAC Meeting

The project submission deadline for the next RCAC meeting is 30th June 2020. Please note that the RCAC meetings are held once per month. Projects received on or before the submission deadline will be included in the agenda for the subsequent RCAC meeting. The detailed procedures, updated templates and forms are available here: https://www.hpc.kaust.edu.sa/account-applications

 

Sixth KAUST-ANSYS Workshop:

Don't forget to register for the event!

Event Dates: June 7-9, 2020

Event Address: Online event (links will be sent to you via email)

Registration                       Agenda           

 

KAUST supercomputer Shaheen II joins the fight against COVID-19

King Abdullah University of Science and Technology (KAUST) invites researchers from across the Kingdom to submit proposals for COVID-19-related research. Recognizing the urgency to address global challenges related to the COVID-19 pandemic through scientific discovery and innovation, the University’s Supercomputing Core Laboratory (KSL) is making computing resources—including the flagship Shaheen II supercomputer and its expert scientists—available to support research projects.

Topics may include but are not limited to: understanding the virus on a molecular level; understanding its fluid-dynamical transport; evaluating the repurposing of existing drugs; forecasting how the disease spreads; and finding ways to stop or slow down the pandemic.

Accepted proposals can access the following resources: (1) Shaheen II, a Cray XC-40 supercomputer based on Intel Haswell processors with nearly 200,000 compute cores tightly connected with Aries high-speed interconnect; (2) Ibex cluster, a high throughput computer system with about 500 computing nodes using Intel Skylake and Cascade Lake CPUs and Nvidia V100 GPUs; and (3) KSL staff scientists, who will provide support, training and consultancy to maximize impact. Through 30 June 2020, up to 15% of these resources will be reserved for fast-tracking competitive COVID-19 proposals through the KAUST Research Computing Allocation Committee.  Thereafter, such proposals remain welcome and will be considered in the standard process.

Applicants can apply for computing allocations using the COVID-19 Project Proposal form. Please submit the form to projects@hpc.kaust.edu.sa. Submitted proposals will be fast-tracked for processing.

Please contact help@hpc.kaust.edu.sa with any inquiries.

 

Tip of the Week: Why is my job still pending in the queue?

Once you submit a job, check its status before closing your sessions. You can get more details and reasons for your job not running by typing squeue --job <jobid > –l :

squeue --job 12376532 -l
Thu Jun  4 18:24:21 2020
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
          12376532 workq     myjob     user1  PENDING       0:00 1-00:00:00     32 (AssocMaxJobsLimit)

A job may be waiting for more than one reason, in which case only one of those reasons is displayed. Here are the most common codes that identify the reason that a job is waiting for execution:

  • AssocMaxJobsLimit The Account associated to the job does not have enough core hours
  • AssociationJobLimit   The job's association has reached its maximum job count.
  • Dependency            This job is waiting for a dependent job to complete.
  •  InvalidQOS            The job's QOS is invalid.
  • PartitionNodeLimit    The number of nodes required by this job is outside of it's partitions current limits.  Can also indicate that required nodes are DOWN or DRAINED.
  • PartitionTimeLimit    The job's time limit exceeds it's partition's current time limit.
  • QOSJobLimit           The job's QOS has reached its maximum job count.
  • ReqNodeNotAvail       Some node specifically required by the job is not currently available.  The node may currently be in use, reserved for another job, in an advanced  reservation

More information is available in the man page of squeue or contact us at help@hpc.kaust.edu.sa

 

Follow us on Twitter

Follow all the latest news on HPC within the Supercomputing Lab and at KAUST, on Twitter @KAUST_HPC.

Previous Announcements

http://www.hpc.kaust.edu.sa/announcements/

Previous Tips

http://www.hpc.kaust.edu.sa/tip/