Over the last couple of weeks you have experienced considerable disruption in using Shaheen II due to a combination of scheduled maintenance sessions and unforeseen failures in hardware and software. We sincerely apologize for the inconvenience these might have caused. Our team works hard to minimize these downtimes, keeping as our most important goal to ensure you are highly productive on our systems.
We are pleased to confirm that the issues we encountered with system following last week’s maintenance session have now been resolved.
The system is now fully available to run jobs.
We would also like to remind you that the next maintenance session on Shaheen II will be from 08:00 on the 8th November for 3 days (until 08:00 on the 11th November).
Maintenance work on Shaheen II is taking longer than originally envisioned. Service will not be returned by tomorrow but we are hopeful that Shaheen II will be operational by the end of the week. There will be occasional disruption to the CDLs, but at least one CDL will be available for users to login to during this period.
KSL Workshop Towards High Efficiency Computing with Allinea
KAUST Supercomputing Laboratory presents the Allinea Software workshop on HPC profiling and debugging: "Towards High Efficiency Computing with Allinea" on October 4th, starting at 9am.
The next maintenance session will be on Tuesday 18th August from 12:00 until 17:00. There will be no access to the system during this period. This will affect Shaheen I, Neser and Shaheen II. Important security updates, custom patches and several bug fixes will be applied to the XC40, which will require the whole system to be rebooted.
We only have a limited number of spare parts for Shaheen, and yesterday we exhausted our stock of node cards.
We have had another node card failure this morning, which means that we are now in the situation where we are ‘cannibalising’ the system to supply parts.
With immediate effect we have taken two node cards offline in rack 00.
This means that we can no longer run 16 rack jobs and the maximum size job that can be run on Shaheen is now 12288 nodes (12 racks).