KAUST Supercomputing Laboratory Newsletter 1st October 2020
In this newsletter:
- RCAC meeting
- Application License Server Maintenance by IT on Thursday, 1st October 2020, 8.00 PM to 4:00 AM 2nd October
- KAUST supercomputer Shaheen II joins the fight against COVID-19
- Tip of the week: Running DDT in batch mode
- Follow us on Twitter
- Previous Announcements
- Previous Tips
RCAC meeting
The project submission deadline for the next RCAC meeting is 31st October 2020. Please note that the RCAC meetings are held once per month. Projects received on or before the submission deadline will be included in the agenda for the subsequent RCAC meeting. The detailed procedures, updated templates and forms are available here: https://www.hpc.kaust.edu.sa/account-applications
Application License Server Maintenance by IT on Thursday, 1st October 2020, 8.00 PM to 4:00 AM 2nd October
Due to a scheduled maintenance of the Application License Server by IT on Thursday, 30th July 2020, 8.00 PM to 4:00 AM next day, access to the below applications will be impacted on Shaheen and Neser:
Ansys, AtomistixToolKit (ATK), Converge, Eclipse, Intel Compilers, Material Studio, Mathematica, MATLAB, Tecplot and Totalview.
During these maintenance windows, you may face issues with Intel at compilation and error with the application at runtime.
KAUST supercomputer Shaheen II joins the fight against COVID-19
King Abdullah University of Science and Technology (KAUST) invites researchers from across the Kingdom to submit proposals for COVID-19-related research. Recognizing the urgency to address global challenges related to the COVID-19 pandemic through scientific discovery and innovation, the University’s Supercomputing Core Laboratory (KSL) is making computing resources—including the flagship Shaheen II supercomputer and its expert scientists—available to support research projects.
Topics may include but are not limited to: understanding the virus on a molecular level; understanding its fluid-dynamical transport; evaluating the repurposing of existing drugs; forecasting how the disease spreads; and finding ways to stop or slow down the pandemic.
Accepted proposals can access the following resources: (1) Shaheen II, a Cray XC-40 supercomputer based on Intel Haswell processors with nearly 200,000 compute cores tightly connected with Aries high-speed interconnect; (2) Ibex cluster, a high throughput computer system with about 500 computing nodes using Intel Skylake and Cascade Lake CPUs and Nvidia V100 GPUs; and (3) KSL staff scientists, who will provide support, training and consultancy to maximize impact. Through 30 June 2020, up to 15% of these resources will be reserved for fast-tracking competitive COVID-19 proposals through the KAUST Research Computing Allocation Committee. Thereafter, such proposals remain welcome and will be considered in the standard process.
Applicants can apply for computing allocations using the COVID-19 Project Proposal form. Please submit the form to projects@hpc.kaust.edu.sa. Submitted proposals will be fast-tracked for processing.
Please contact help@hpc.kaust.edu.sa with any inquiries.
Tip of the week: Running DDT in batch mode
If you debug long running applications at scale, it is useful to run the application as a batch process without interactivity so that you do not have to be at the terminal.
Arm-forge provides such a utility. ddt, the ARM-forge parallel debugger allows offline debugging of such jobs. Generally you need to recompile your application with -g flag. Also useful is to downgrade the automatic compiler optimization flag to -O0 so that you hone onto the actual issue rather than what has been optimized by the compiler in your code.
The following will launch ddt in offline mode in your jobscript:
module load arm-forge ddt --offline -o report-${SLURM_JOBID}.txt srun -n NUM_TASKS ./application args
In the launch line above, the -o argument is an optional one to choose the format of the debugging report. The default is a html file which you can download and open in any browser. In the above example I am asking for the report as a text file for convenient examination on Shaheen.
To enable memory debugging, you can use --mem-debug=[fast,balanced,thorough] option of ddt. Note that you must preload the libdmalloc required to track the heap memory allocations and deallocations. It is a handy utility to mark the memory consumption patterns of a crashing application.
module load arm-forge/20.0.3 ddt --offline -o report-${SLURM_JOBID}.txt --mem-debug=balanced srun -n NUM_TASKS ./application args
The report includes traceback information along with line number in the source code and trace of local stack variables for the MPI process. This can be a good start to understand when and where issues occur.
Follow us on Twitter
Follow all the latest news on HPC within the Supercomputing Lab and at KAUST, on Twitter @KAUST_HPC.
Previous Announcements
http://www.hpc.kaust.edu.sa/announcements/