KAUST Supercomputing Laboratory Newsletter 18th June 2020
In this newsletter:
- Maintenance Session 22nd June
- RCAC Meeting
- KAUST supercomputer Shaheen II joins the fight against COVID-19
- Tip of the Week: Memory report
- Follow us on Twitter
- Previous Announcements
- Previous Tips
Maintence Session 22nd June
Shaheen will be unavailable to run jobs for approximately 1 hour from 09:00 on the 22nd June to allow for an urgent update of the Slurm job scheduler.
RCAC Meeting
The project submission deadline for the next RCAC meeting is 30th June 2020. Please note that the RCAC meetings are held once per month. Projects received on or before the submission deadline will be included in the agenda for the subsequent RCAC meeting. The detailed procedures, updated templates and forms are available here: https://www.hpc.kaust.edu.sa/account-applications
KAUST supercomputer Shaheen II joins the fight against COVID-19
King Abdullah University of Science and Technology (KAUST) invites researchers from across the Kingdom to submit proposals for COVID-19-related research. Recognizing the urgency to address global challenges related to the COVID-19 pandemic through scientific discovery and innovation, the University’s Supercomputing Core Laboratory (KSL) is making computing resources—including the flagship Shaheen II supercomputer and its expert scientists—available to support research projects.
Topics may include but are not limited to: understanding the virus on a molecular level; understanding its fluid-dynamical transport; evaluating the repurposing of existing drugs; forecasting how the disease spreads; and finding ways to stop or slow down the pandemic.
Accepted proposals can access the following resources: (1) Shaheen II, a Cray XC-40 supercomputer based on Intel Haswell processors with nearly 200,000 compute cores tightly connected with Aries high-speed interconnect; (2) Ibex cluster, a high throughput computer system with about 500 computing nodes using Intel Skylake and Cascade Lake CPUs and Nvidia V100 GPUs; and (3) KSL staff scientists, who will provide support, training and consultancy to maximize impact. Through 30 June 2020, up to 15% of these resources will be reserved for fast-tracking competitive COVID-19 proposals through the KAUST Research Computing Allocation Committee. Thereafter, such proposals remain welcome and will be considered in the standard process.
Applicants can apply for computing allocations using the COVID-19 Project Proposal form. Please submit the form to projects@hpc.kaust.edu.sa. Submitted proposals will be fast-tracked for processing.
Please contact help@hpc.kaust.edu.sa with any inquiries.
Tip of the Week: Memory report with MPICH_MEMORY_REPORT
MPI has built-in memory profiling tools analyzing memory usage. You can use it by just adding the MPICH_MEMORY_REPORT environment in your job bash script.
- If set to 1, print a summary of the min/max high water mark and associated rank to stderr.
- If set to 2, output each rank's high water mark to a file as specified using MPICH_MEMORY_REPORT_FILE.
- If set to 3, do both 1 and 2.
export MPICH_MEMORY_REPORT=3 export MPICH_MEMORY_REPORT_FILE=/project/kxxx/username/test_memory_report/mem srun --hint=nomultithread --ntasks=4 --ntasks-per-node=2 --ntasks-per-socket=1 ./my_exe # [2] Max memory allocated by malloc: 3068096 bytes # [2] Max memory allocated by mmap: 8388736 bytes # [2] Max memory allocated by shmget: 8522608 bytes # [0] Max memory allocated by malloc: 3068032 bytes # [0] Max memory allocated by mmap: 8388736 bytes # [0] Max memory allocated by shmget: 8522608 bytes # [3] Max memory allocated by malloc: 3066864 bytes # [3] Max memory allocated by mmap: 8388736 bytes # [3] Max memory allocated by shmget: 0 bytes # [1] Max memory allocated by malloc: 3066800 bytes # [1] Max memory allocated by mmap: 8388736 bytes # [1] Max memory allocated by shmget: 0 bytes
This is useful for memory profiling/debugging, nevertheless this might introduced some overhead with detailed reports.
This summary reports maximum and minimum values and the lowest rank that reported the value (max_loc/min_loc reductions). The by malloc lines are for malloc/free calls. The by mmap lines are for mmap/munmap calls. The by shmget lines are for shmget or SYSCALL(shmget) and shmctl(..RM_ID..)
More info is available in the man page intro_mpi. Run "man intro_mpi" on any of the cdls, and search (by hitting "/") for "report", then "n" for the next result. You'll see this section on how to invoke mpich's memory reporting function.
Follow us on Twitter
Follow all the latest news on HPC within the Supercomputing Lab and at KAUST, on Twitter @KAUST_HPC.
Previous Announcements
http://www.hpc.kaust.edu.sa/announcements/