KAUST Supercomputing Laboratory Newsletter 16th April 2020

In this newsletter:

  • New release of Converge
  • System Maintenance
  • RCAC Meeting
  • Tip of the Week: Performance Variability
  • Follow us on Twitter
  • Previous Announcements
  • Previous Tips

 

New release of Converge

Good news for CONVERGE users on Shaheen. The latest version i.e. converge/3.0.12 is installed and tested on Shaheen. Please use the latest version for your future simulations. Older versions of CONVERGE had bugs that are fixed in converge/3.0.12. Just load the module (module load converge/3.0.12) and use "converge" as binary in your job scripts. For post processing, it is highly recommended to use the parallel version of post_convert i,e, post_convert_30-mpich. For any help please contact help@hpc.kaust.edu.sa

System Maintenance

The next maintenance session will take place from 08:00 on Monday 11th May until 17:00 on Wednesday 13th May. There will be no access to the system during this period.

RCAC Meeting

The project submission deadline for the next RCAC meeting is 30th April 2020. Please note that the RCAC meetings are held once per month. Projects received on or before the submission deadline will be included in the agenda for the subsequent RCAC meeting. The detailed procedures, updated templates and forms are available here: https://www.hpc.kaust.edu.sa/account-applications

Tip of the Week: Performance Variability

Performance variability is a known issue on current HPC systems. There are potential causes of variability on Shaheen that users may face, nevertheless, there are few best practices to mitigate variability and improve application performance.

  • Hugepages: for communications bound applications, especially in the case of many MPI_Alltoall operations, using hugepages can reduce the cost of accessing memory:
    •  load the hugepages module (module load craype-hugepages2M)
    • recompile your code
    • add module load craype-hugepages2M to batch scripts
    • For information type  man intro_hugepages

 

  • Affinity: hyperthreading is enabled on Shaheen, and running with correct affinity and binding options can greatly affect variability.
    • Use the maximum of ranks per node until performance drops. Using 1 rank per node cannot utilize the full network bandwidth.   
    • Use the job generator to get correct binding: https://www.hpc.kaust.edu.sa/job
    • Measure the performance with and without the option srun --hint=nomultithread

 

Follow us on Twitter

Follow all the latest news on HPC within the Supercomputing Lab and at KAUST, on Twitter @KAUST_HPC.

Previous Announcements

http://www.hpc.kaust.edu.sa/announcements/

Previous Tips

http://www.hpc.kaust.edu.sa/tip/