MPI has built-in memory profiling tools analyzing memory usage. You can use it by just adding the MPICH_MEMORY_REPORT environment in your job bash script.
- If set to 1, print a summary of the min/max high water mark and associated rank to stderr.
- If set to 2, output each rank's high water mark to a file as specified using MPICH_MEMORY_REPORT_FILE.
- If set to 3, do both 1 and 2.
export MPICH_MEMORY_REPORT=3 export MPICH_MEMORY_REPORT_FILE=/project/kxxx/username/test_memory_report/mem srun --hint=nomultithread --ntasks=4 --ntasks-per-node=2 --ntasks-per-socket=1 ./my_exe # [2] Max memory allocated by malloc: 3068096 bytes # [2] Max memory allocated by mmap: 8388736 bytes # [2] Max memory allocated by shmget: 8522608 bytes # [0] Max memory allocated by malloc: 3068032 bytes # [0] Max memory allocated by mmap: 8388736 bytes # [0] Max memory allocated by shmget: 8522608 bytes # [3] Max memory allocated by malloc: 3066864 bytes # [3] Max memory allocated by mmap: 8388736 bytes # [3] Max memory allocated by shmget: 0 bytes # [1] Max memory allocated by malloc: 3066800 bytes # [1] Max memory allocated by mmap: 8388736 bytes # [1] Max memory allocated by shmget: 0 bytes
This is useful for memory profiling/debugging, nevertheless this might introduced some overhead with detailed reports.
This summary reports maximum and minimum values and the lowest rank that reported the value (max_loc/min_loc reductions). The by malloc lines are for malloc/free calls. The by mmap lines are for mmap/munmap calls. The by shmget lines are for shmget or SYSCALL(shmget) and shmctl(..RM_ID..)
More info is available in the man page intro_mpi. Run "man intro_mpi" on any of the cdls, and search (by hitting "/") for "report", then "n" for the next result. You'll see this section on how to invoke mpich's memory reporting function.