Detecting memory leaks and errors with Valgrind4hpc tool
Valgrind4hpc debugging tool helps in the detection of memory leaks and errors in parallel applications. It's similar to valgrind, which is designed for serial applications.
Compile and link with -g option , then allocate and follow the steps shown bellow. This is an example using one node with 2 tasks.
salloc -N 1 module unload darshan xalt module load valgrind4hpc export CTI_WLM_IMPL=slurm export CTI_LAUNCHER_NAME=srun valgrind4hpc -n2 --launcher-args="--hint=nomultithread --ntasks=2" --valgrind-args="--track-origins=yes --leak-check=full" ./my_exe
Here is a clean output. Otherwise, follow the instructions to detect the memory leaks:
RANKS: <0,1> HEAP SUMMARY: in use at exit: 0 bytes in 0 blocks All heap blocks were freed -- no leaks are possible ERROR SUMMARY: 0 errors from 0 contexts (suppressed 19)
To run your program and debug it across multiple nodes, allocate the desired number of nodes and then update accordingly the parameters in the launcher-args similar to the option for the srun/sbatch script.
Note that valgrind4hpc and target program arguments should be seperated by two dashes, --
More information is available in the man pages of valgrind and valgrind4hpc.