KAUST Supercomputing Laboratory Newsletter 20th January 2016
Annual Power Maintenance
Due to the annual power maintenance in the data centre, all of the systems will be unavailable from 08:00 on Thursday 11th February until approximately 10:00 on Monday 15th February.
KSL Workshop Series: Optimizing I/O on Shaheen II
Thursday, February 4, 2016
The KAUST Supercomputing Core Laboratory (KSL) invites you to the second workshop on our seminar series about Shaheen II. This workshop will focus on maximizing efficient use of the parallel file system. It will provide an overview of Parallel I/O, explore using various profiling tools for validating I/O performance, and cover best practices for efficient I/O operations. The scientific focus will be on codes for climate, seismic, and biology applications.
This seminar is of particular interest to Shaheen II users dealing with large files, or a large number of files.
Seats are limited. Please register your interest at: https://www.surveymonkey.com/r/H6FNM7C
Venue/Time: Computer Lab Room, 3rd floor at the Library, from 9:30am-11:00am
Agenda:
09:30am - Optimizing I/O on Shaheen II
10:00am - Interactive Exercises on Shaheen II
10:30am - Q&A with KSL team (bring all your HPC questions)
Shaheen I/ Neser Data
We will continue to have the Shaheen I/Neser ‘home' and ‘project' filesystems available until at least 31st July 2016. However, please note that the ‘scratch' filesystem will be taken off-line and deleted on the 1st February 2016.
For data that is needed for projects on Shaheen II, rather than copying it yourself, please contact us and we can assist in moving the data as we have dedicated systems for this that have direct access to both storage subsystems.
Tip of the Week: Job Arrays
SLURM allows you to submit and manage multiple similar jobs quickly and easily thanks to job arrays. Job arrays can be specified using two techniques:
- in a batch directive,
-
#SBATCH --array=1-10
-
- in the command line as
-
cdl:~>sbatch --array=0-10 my_slurm_script.sh
-
This will generate a job array containing 10 jobs. If the sbatch command responds "Submitted batch job 100 " then the environment variables will be set as follows:
SLURM_JOBID=100
SLURM_ARRAY_JOB_ID=100
SLURM_ARRAY_TASK_ID=1
SLURM_JOBID=101
SLURM_ARRAY_JOB_ID=100
SLURM_ARRAY_TASK_ID=2
….
It is advised to update the job's stdin, stdout, and stderr filenames, as follows:
#SBATCH –outputs= slurm-%A_%a.out
where %A will be replaced by the value of SLURM_ARRAY_JOB_ID and %a will be replaced by the value of SLURM_ARRAY_TASK_ID
To visualize the status , you can use the squeue -u username command, however the jobs pending will appear in one line. For a better formatting and to check the status of both running and pending jobs, you can add the option –r
100_1 user1 kx jarray R None 2016-01-11T00:16:36 16:19:53 7:40:07 8 100_2 user1 kx jarray R None 2016-01-11T00:16:36 16:19:53 7:40:07 8 100_3 user1 kx jarray R None 2016-01-11T00:16:36 16:19:53 7:40:07 8 100_4 user1 kx jarray R None 2016-01-11T00:16:36 16:19:53 7:40:07 8 100_5 user1 kx jarray R None 2016-01-11T00:16:36 16:19:53 7:40:07 8 100_6 user1 kx jarray PD JobArrayT N/A 0:00 10:00 8 100_7 user1 kx jarray PD JobArrayT N/A 0:00 10:00 8 100_8 user1 kx jarray PD JobArrayT N/A 0:00 10:00 8 100_9 user1 kx jarray PD JobArrayT N/A 0:00 10:00 8 100_10 user1 kx jarray PD JobArrayT N/A 0:00 10:00 8
To limit the number of simultaneously running tasks jobs to 2 for example, use the %2 as follows, "--array=0-10%2"