KAUST Supercomputing Laboratory Newsletter 13th October

Data Centre Firewall Upgrade

On 27th October between 17:00 and 21:00, KAUST IT will be upgrading the SCC firewall. As Shaheen is behind this firewall, there will be intermittent access during the upgrade period.

System Maintenance *Updated*

The next scheduled maintenance session on Shaheen requires a 24 hour outage from 17:00 on Monday 24th October until 17:00 on Tuesday 25th October, there will be no access to the system during this period.

There will be an extended down time of all systems from 15:00 on 30th November until 17:00 on 6th December. This is for an upgrade to the power and cooling to allow Shaheen to be run at full capacity without power capping. During this period we will not be able to read or respond to any emails sent to help@hpc.kaust.edu.sa.

Neser Last Day of Operation

Please note this system will be decommissioned on 30th November 2016.

After this date all data in /project and /home will be deleted. Please ensure that you have transferred any data you wish to retain.

Tip of the Week: How to estimate your memory usage per node.

The command time in the Unix operating system is used to determine the duration of execution of a particular command. For example:

@cdl1:~> /usr/bin/time ls
   real    0m0.017s
   user    0m0.004s
   sys    0m0.004s

By calling time with the --verbose option, it also provides you with additional useful information about memory consumption, bytes dumped to or read from the filesystem or exchanged through sockets. For example:
    

@cdl1:~>/usr/bin/time --verbose ls -r

   <output of 'ls' command>
    Command being timed: "ls -r"
    User time (seconds): 0.01
    System time (seconds): 0.01
    Percent of CPU this job got: 61%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.01
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 4912
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 375
    Voluntary context switches: 1
    Involuntary context switches: 1150
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

You may want to apply this command on a job running in parallel on shaheen. As time outputs on the standard error stream, you just to tell srun to create as many error files as tasks running in order to differentiate the timing observed on each task. For example:

    srun --error=job.%t.err -n 4 /usr/bin/time --verbose my_program

will produce 4 error files (job.0.err, job.1.err, job.2.err, and job.3.err), each containing the output of the time command related to each one of the 4 parallel tasks.
 

Follow us on Twitter

Follow all the latest news on HPC within the Supercomputing Lab and at KAUST, on Twitter @KAUST_HPC.

Previous Announcements

http://www.hpc.kaust.edu.sa/announcements/

Previous Tips

http://www.hpc.kaust.edu.sa/tip/