Valgrind4hpc debugging tool helps in the detection of memory leaks and errors in parallel applications. It's similar to valgrind, which is designed for serial applications.
Compile and link with -g option , then allocate and follow the steps shown bellow. This is an example using one node with 2 tasks.
salloc -N 1 module unload darshan xalt module load valgrind4hpc export CTI_WLM_IMPL=slurm export CTI_LAUNCHER_NAME=srun valgrind4hpc -n2 --launcher-args="--hint=nomultithread --ntasks=2" --valgrind-args="--track-origins=yes --leak-check=full" ./my_exe
Here is a clean output. Otherwise, follow the instructions to detect the memory leaks:
RANKS: <0,1> HEAP SUMMARY: in use at exit: 0 bytes in 0 blocks All heap blocks were freed -- no leaks are possible ERROR SUMMARY: 0 errors from 0 contexts (suppressed 19)
To run your program and debug it across multiple nodes, allocate the desired number of nodes and then update accordingly the parameters in the launcher-args similar to the option for the srun/sbatch script.
Note that valgrind4hpc and target program arguments should be seperated by two dashes, --
More information is available in the man pages of valgrind and valgrind4hpc.