Running DDT in batch mode
If you debug long running applications at scale, it is useful to run the application as a batch process without interactivity so that you do not have to be at the terminal.
Arm-forge provides such a utility. ddt, the ARM-forge parallel debugger allows offline debugging of such jobs. Generally you need to recompile your application with -g flag. Also useful is to downgrade the automatic compiler optimization flag to -O0 so that you hone onto the actual issue rather than what has been optimized by the compiler in your code.
The following will launch ddt in offline mode in your jobscript:
module load arm-forge ddt --offline -o report-${SLURM_JOBID}.txt srun -n NUM_TASKS ./application args
In the launch line above, the -o argument is an optional one to choose the format of the debugging report. The default is a html file which you can download and open in any browser. In the above example I am asking for the report as a text file for convenient examination on Shaheen.
To enable memory debugging, you can use --mem-debug=[fast,balanced,thorough] option of ddt. Note that you must preload the libdmalloc required to track the heap memory allocations and deallocations. It is a handy utility to mark the memory consumption patterns of a crashing application.
module load arm-forge/20.0.3 ddt --offline -o report-${SLURM_JOBID}.txt --mem-debug=balanced srun -n NUM_TASKS ./application args
The report includes traceback information along with line number in the source code and trace of local stack variables for the MPI process. This can be a good start to understand when and where issues occur.