Shaheen-I job size limitation
We only have a limited number of spare parts for Shaheen, and yesterday we exhausted our stock of node cards.
We have had another node card failure this morning, which means that we are now in the situation where we are ‘cannibalising’ the system to supply parts.
With immediate effect we have taken two node cards offline in rack 00.
This means that we can no longer run 16 rack jobs and the maximum size job that can be run on Shaheen is now 12288 nodes (12 racks).
Data Centre Power Outage Notification – Shaheen II Benchmark Test
As part of Shaheen-II contractual acceptance, a critical benchmark test is required to be performed (High-Performance Linpack, or HPL). HPL is a very power intensive test that requires all the data centre cooling to be made available for Shaheen II during this stress test.
In addition to being a contractual acceptance requirement, the results of this test will be used to determine KAUST’s placement in the Top 500 list of the fastest supercomputers in the world.
The Scientific Computer Centre (SCC) data centre will therefore experience a full outage from Monday 4th May 2015 17:00 to Sunday 17th May 10:00. We realise and acknowledge that this outage period will have a negative impact on your research and we will endeavour to make it as short an outage as possible.
This is a one-time test that is required to be run over consecutive days.
/scratch disk usage
The /scratch filesystem is designed for interim high speed storage during job runs on the system. The disk space usage of /scratch is now over 85% and is continuing to grow. Please could we ask all users to remove any files they no longer need. We may have to consider implementing an automatic removal of old files if the free space available continue to decline.
Tip of the Week: Record a log of your terminal session
How can we record everything we type in the terminal and all the output produced into a file? There is a command named script that can be used for this purpose. Just type
script <filename> and everything will be recorded, not only the commands, but also the output. Once finished, type
exit to end the recording session.
bash-3.2$ script my_history_April28_2015
Script started, output file is my_history_April28_2015
Script done, output file is my_history_April28_2015
Follow us on Twitter
Follow all the latest news on HPC within the Supercomputing Lab and at KAUST, on Twitter @KAUST_HPC.
KSL Announcements archives can be viewed at http://www2.hpc.kaust.edu.sa/mailman/private/announce.
Questions, concerns, or problems related to Shaheen should be directed to email@example.com. From there your issue will be assigned to an appropriate primary contact who will track your issue to resolution. We may sometimes have to request items from KAUST IT or other service providers.
Getting an account
FAQs on the process and requirements for obtaining access to the Shaheen Systems are available here.
Getting an account on Shaheen is a three-step process and is detailed in this flowchart.
You should consult our Terms and Conditions of Usage.
- Your organisation or department must submit the Organisational Access Application, establishing a relationship between your home organisation and the KAUST Supercomputing Laboratory (KSL).
- You (or your Principal Investigator) must submit a Project Proposal describing the work to be done and the resources your project will require.
- You must submit an Individual Account Application, supplying identification information from which we can generate login credentials.
Use the secure submission form to submit credentials or other private information.
Accessing KSL Systems
After you have been given login credentials (i.e., a username and password) you will be able to access any Shaheen systems to which you have been granted access. Primary computational access to Shaheen and Neser is available via the ssh (secure shell) protocol.
PIs are required to submit periodic project reports during the course of their work on the KSL systems, there is a template available for this.
Data Centre Visits
To request a visit to the Data Centre, in the first instance, please complete this form.
Because the Shaheen Systems are a widely shared resource, user reservations are not encouraged. However, if a user feels that making a reservation for dedicated access to all or part of the Shaheen systems is essential for their work, then KSL will consider such requests on a case-by-case basis within the bounds of the following guidelines:
- A minimum of 14 days, preferably 30 days, notice is required for all reservations.
- The maximum duration allowed for a reservation will be 48 hours.
- A reservation may request dedicated access to between 1 and 16 racks of the Shaheen BG/P system. Reservations of less than 1 rack of BG/P will not be permitted.
- Users’ project accounts will be charged for all racks reserved for the entire reservation period, regardless of whether they are all used or not.
- There are no guarantees that a reservation request will be accepted.
Exceptions to this policy may be allowed based on the merits of a request, but very rarely.
- If a user has a complaint about a KSL decision relating to resource allocation, and the user remains unhappy after discussing their concerns with KSL Management, users can request arbitration from the Chairman of the KSL Resource Allocation Committee (KSL RAC) (currently Dr David Keyes). If the user remains unhappy with the decision of the Chairman of the KSL RAC, users can request arbitration from the Chairman of the KSL Management Committee (KSLMC).
- If a user has a complaint about KSL policies, or other matters not relating to resource allocation, and the user remains unhappy after discussing their concerns with KSL Management, users can request arbitration from the Chairman of the KSL Management Committee (KSLMC).
- Get a quick look at the machines.