KAUST Supercomputing Laboratory Newsletter 17th February 2016
The Third Annual Workshop on "Accelerating Scientific Applications Using GPUs"
The KAUST Supercomputing Laboratory is co-organizing with NVIDIA, a leader in accelerated computing, a one day workshop on accelerating scientific applications using GPUs on Tuesday February 23rd, 2016 in the auditorium between building 2 and 3. To register to the event, please click here
The event will be followed up by a two-day GPU hack-a-thon in which selected teams of developers will be guided by OpenACC and CUDA mentors from NVIDIA and KAUST to port and accelerate their domain science application to GPU accelerators. Space will be limited to 4-5 teams. Please click here to submit your hack-a-thon proposal.
Please contact us at training@hpc.kaust.edu.sa if you need further information.
We are looking forward to seeing you there.
Saber Feki, Workshop Chair
Bilel Hadri and Hatem Ltaief, Workshop Co-Chairs
Tip of the Week: Why is my job not running ?
When the estimated start time of your pending job is not available, you can get more details and reasons for your job not running:
By typing squeue --job <jobid >–l , you will get the following output along with the reason for your job not running.
JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON) 110000 workq 8-tuned_ user1 PENDING 0:00 3-00:00:00 1 (AssocGrpCPUMinutesLimit)
Here are the most common reasons. These codes identify the reason that a job is waiting for execution. A job may be waiting for more than one reason, in which case only one of those reasons is displayed.
AssocGrpCPUMinutesLimit |
This job is waiting for a dependent job to complete. |
Cleaning |
The job is being requeued and still cleaning up from its previous execution. |
Dependency |
This job is waiting for a dependent job to complete. |
JobHeldAdmin |
The job is held by a system administrator |
JobHeldUser |
The job is held by the user |
NodeDown |
A node required by the job is down. |
Priority |
One or more higher priority jobs exist for this partition or advanced reservation. Other jobs in the queue have higher priority than yours. |
QOSUsageThreshold |
Required QOS threshold has been breached |
ReqNodeNotAvail |
No nodes can be found satisfying your limits, for instance because maintenance is scheduled and the job can not finish before it |
Reservation |
The job is waiting for its advanced reservation to become available. |
Resources |
The job is waiting for resources (nodes) to become available and will run when Slurm finds enough free nodes. |
SystemFailure |
Failure of the SLURM system, a file system, the network, etc. |
Follow us on Twitter
Follow all the latest news on HPC within the Supercomputing Lab and at KAUST, on Twitter @KAUST_HPC.