KAUST Supercomputing Laboratory Newsletter 16th April 2020
In this newsletter:
- New release of Converge
- System Maintenance
- RCAC Meeting
- Tip of the Week: Performance Variability
- Follow us on Twitter
- Previous Announcements
- Previous Tips
New release of Converge
Good news for CONVERGE users on Shaheen. The latest version i.e. converge/3.0.12 is installed and tested on Shaheen. Please use the latest version for your future simulations. Older versions of CONVERGE had bugs that are fixed in converge/3.0.12. Just load the module (module load converge/3.0.12) and use "converge" as binary in your job scripts. For post processing, it is highly recommended to use the parallel version of post_convert i,e, post_convert_30-mpich. For any help please contact help@hpc.kaust.edu.sa
System Maintenance
The next maintenance session will take place from 08:00 on Monday 11th May until 17:00 on Wednesday 13th May. There will be no access to the system during this period.
RCAC Meeting
The project submission deadline for the next RCAC meeting is 30th April 2020. Please note that the RCAC meetings are held once per month. Projects received on or before the submission deadline will be included in the agenda for the subsequent RCAC meeting. The detailed procedures, updated templates and forms are available here: https://www.hpc.kaust.edu.sa/account-applications
Tip of the Week: Performance Variability
Performance variability is a known issue on current HPC systems. There are potential causes of variability on Shaheen that users may face, nevertheless, there are few best practices to mitigate variability and improve application performance.
- Hugepages: for communications bound applications, especially in the case of many MPI_Alltoall operations, using hugepages can reduce the cost of accessing memory:
- load the hugepages module (module load craype-hugepages2M)
- recompile your code
- add module load craype-hugepages2M to batch scripts
- For information type man intro_hugepages
- Affinity: hyperthreading is enabled on Shaheen, and running with correct affinity and binding options can greatly affect variability.
- Use the maximum of ranks per node until performance drops. Using 1 rank per node cannot utilize the full network bandwidth.
- Use the job generator to get correct binding: https://www.hpc.kaust.edu.sa/job
- Measure the performance with and without the option srun --hint=nomultithread
- IO: for I/O bound applications, make sure that the file striping is adapted to the large size of your files. More details are available https://www.hpc.kaust.edu.sa/training
Follow us on Twitter
Follow all the latest news on HPC within the Supercomputing Lab and at KAUST, on Twitter @KAUST_HPC.
Previous Announcements
http://www.hpc.kaust.edu.sa/announcements/