Dear Users
We have encountered a major issue affecting the availability of the Lustre filesystem.
Cray have recommended that we perform and immediate shutdown of the system to prevent data loss.
We are working on identifying the reason for the failure and will update you when we have more information.
|
Dear Shaheen II Users,
Over the last couple of weeks you have experienced considerable disruption in using Shaheen II due to a combination of scheduled maintenance sessions and unforeseen failures in hardware and software. We sincerely apologize for the inconvenience these might have caused. Our team works hard to minimize these downtimes, keeping as our most important goal to ensure you are highly productive on our systems.
|
Good Morning
We are pleased to confirm that the issues we encountered with system following last week’s maintenance session have now been resolved.
The system is now fully available to run jobs.
We would also like to remind you that the next maintenance session on Shaheen II will be from 08:00 on the 8th November for 3 days (until 08:00 on the 11th November).
|
Extended maintenance sessions in October.
Maintenance work on Shaheen II is taking longer than originally envisioned. Service will not be returned by tomorrow but we are hopeful that Shaheen II will be operational by the end of the week. There will be occasional disruption to the CDLs, but at least one CDL will be available for users to login to during this period.
|
Extended maintenance sessions in October.
We would like to remind our users of the extended outage on Shaheen II in October for necessary maintenance:
25th October for 3 days
There will be no access to the system during these periods.
Tip of the week: Queues on Shaheen II Cray XC40
Two different queue are available on Shaheen II:
|
Eid Al-Adha Holiday, 20-27th September
|
KSL Workshop Towards High Efficiency Computing with Allinea
KAUST Supercomputing Laboratory presents the Allinea Software workshop on HPC profiling and debugging: "Towards High Efficiency Computing with Allinea" on October 4th, starting at 9am.
Workshop topics include:
|
Maintenance Session Tuesday 18th August
The next maintenance session will be on Tuesday 18th August from 12:00 until 17:00. There will be no access to the system during this period. This will affect Shaheen I, Neser and Shaheen II. Important security updates, custom patches and several bug fixes will be applied to the XC40, which will require the whole system to be rebooted.
Tip of the Week: What command did I type before ?
History command
|
Shaheen II Cray XC40 Workshop Announcement
Date: 7th June to 11th June 2015
Where: KAUST: Auditorium Al-Haytham (down the steps between Bldg2 and Bldg3)
KAUST Supercomputing Lab and Cray are offering a series of three courses:
*Sunday 7th June to Tuesday 9th, 2015 Introduction to the new Shaheen II Cray XC40
*Wednesday 10th June 2015 Efficient Parallel I/O
*Thursday 11th June 2015 Port and optimize your own code on the Cray XC40
|
Shaheen-I job size limitation
We only have a limited number of spare parts for Shaheen, and yesterday we exhausted our stock of node cards.
We have had another node card failure this morning, which means that we are now in the situation where we are ‘cannibalising’ the system to supply parts.
With immediate effect we have taken two node cards offline in rack 00.
This means that we can no longer run 16 rack jobs and the maximum size job that can be run on Shaheen is now 12288 nodes (12 racks).
|
Power Outage Thursday 9th April to Monday 13th April
In preparation for the introduction of the new Cray supercomputer, there will be a site-wide power outage to the Data Centre currently housing Shaheen1 and Neser. All services, including Shaheen and Neser, will be shut down from 16:00 on Thursday 9th April until approximately 11:00 on Monday 13th April.
We apologise for the late notice and for any inconvenience that this may cause.
|
Shaheen and Neser unavailable 18th-20th March 2015
In preparation for the introduction of the new Cray supercomputer, there will be a site-wide power outage to the Data Centre currently housing Shaheen1 and Neser. All services, including Shaheen1 and Neser, will be shut down from 17:00 on Wednesday 18th March until approximately 11:00 on 20th March.
|
Annual Power Maintenance
Due to annual power maintenance in Building 1, all systems will be shutdown from 16:00 on Thursday 26th February until approximately 10:00 on Sunday 1st March.
Tip of the Week: Using TotalView Debugger for Parallel jobs on Neser
In this example, we will be using the Intel compiler along with OpenMPI/1.6.4/intel( intel-compilers/11.1 and openmpi/1.6.4/intel should be loaded). In addition, it is required that you connect to Neser using the ssh –X option, to get the Totalview GUI ( Graphical User Interface).
|
Disruption to Shaheen earlier today
Earlier today, a fault occurred on both Shaheen front end nodes, rendering them completely unresponsive. To remedy the situation, both front end nodes were rebooted. We apologise for any disruption this may have caused. The root cause of the problem is still under investigation.
Maintenance Session Tuesday 17th February
The next maintenance session will be on Tuesday 17th February from 12:00 until 17:00. There will be no access to the system during this period.
|
KSL Presents the XSEDE HPC Monthly Workshop on OpenACC This Friday
Registration deadline Tomorrow. For registration and further information click here.
Invitation to attend the second KAUST-NVIDIA workshop on "Accelerating Scientific Applications using GPUs" on February 17th, 2015
The Supercomputing Laboratory is pleased to announce the second KAUST-NVIDIA one-day workshop about accelerating scientific applications using GPUs on February 17th.
The registration is free but required using this link.
|
|