KAUST Supercomputing Laboratory Newsletter 17th February 2016

The Third Annual Workshop on "Accelerating Scientific Applications Using GPUs"

The KAUST Supercomputing Laboratory is co-organizing with NVIDIA, a leader in accelerated computing, a one day workshop on accelerating scientific applications using GPUs on Tuesday February 23rd, 2016 in the auditorium between building 2 and 3. To register to the event, please click here

The event will be followed up by a two-day GPU hack-a-thon in which selected teams of developers will be guided by OpenACC and CUDA mentors from NVIDIA and KAUST to port and accelerate their domain science application to GPU accelerators. Space will be limited to 4-5 teams. Please click here to submit your hack-a-thon proposal.

Please contact us at training@hpc.kaust.edu.sa if you need further information.

We are looking forward to seeing you there.

Saber Feki, Workshop Chair

Bilel Hadri and Hatem Ltaief, Workshop Co-Chairs

Tip of the Week: Why is my job not running ?

When the estimated start time of your pending job is not available, you can get more details and reasons for your job not running:

By typing squeue --job <jobid >–l , you will get the following output along with the reason for your job not running.

           JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
           110000   workq 8-tuned_   user1  PENDING       0:00 3-00:00:00      1 (AssocGrpCPUMinutesLimit)

 

Here are the most common reasons. These codes identify the reason that a job is waiting for execution.  A job may be waiting for more than one reason, in which case only one of those reasons is displayed.        

 

AssocGrpCPUMinutesLimit 

This job is waiting for a dependent job to complete.

Cleaning

The job is being requeued and still cleaning up from its previous execution.        

Dependency

This job is waiting for a dependent job to complete.

JobHeldAdmin

The job is held by a system administrator

JobHeldUser          

The job is held by the user

NodeDown

A node required by the job is down.

Priority

One or more higher priority jobs exist for this partition or advanced reservation. Other jobs in the queue have higher priority than yours.

QOSUsageThreshold

Required QOS threshold has been breached

ReqNodeNotAvail

No nodes can be found satisfying your limits, for instance because maintenance is scheduled and the job can not finish before it

Reservation

The job is waiting for its advanced reservation to become available.

Resources

The job is waiting for resources (nodes) to become available and will run when Slurm finds enough free nodes.

SystemFailure

Failure of the SLURM system, a file system, the network, etc.

 

Follow us on Twitter

Follow all the latest news on HPC within the Supercomputing Lab and at KAUST, on Twitter @KAUST_HPC.

Previous Announcements

Previous Tips