Seminar on Performance and Debugging Analysis Tools. How can I check the performance of my code on CPUs and/or GPUs?


Seminar on Performance and Debugging Analysis Tools. How can I check the performance of my code on CPUs and/or GPUs?

Date: Wednesday, 3 March 2021, 4-6pm KSA Time

Location: Zoom

Registration: (closed)


The KAUST Supercomputing Core Lab is organizing a seminar on Performance and Debugging Analysis Tools with contributions from HPE and ARM. This seminar will help the community developing and using C/C++/Fortran and python codes to debug and check the performance of their code and optimize their workflow on KSL HPC systems using CPUs and/or GPUs.

Performance analysis is a key step for efficient development and usage of scientific codes on supercomputers and clusters featuring heterogeneous architectures such as CPUs and GPUs.

When dealing with large and complex research code on high performance computing systems, the need to profile the performance becomes necessary to achieve high degree of performance optimization. This requires software tools such as a debugger and profiler with scaled efficiency that can provide accurate information without creating performance bottlenecks.

This seminar will introduce performance analysis tools like Cray Perftools and Arm Forge and how to use them to assess program behavior, along with live demonstrations on simple codes.



4-4.15pm Bilel Hadri : Introduction and motivations about Performance tools on HPC systems.(slides,video)

4.15-5pm : Suyash Sharma (ARM) ( slides,video,exercices_demo)

  • ARM Forge Overview: Intro to Debugger, Profiler & Performance Report
  • Profiling techniques with ARM MAP
  • Demo using Python & CUDA code samples

5-5.45pm: Aniello Esposito (HPE) ( slides,video, exercices_demo)

  • Overview on Cray Perftools for profiling approaches, sampling and event tracing,
  • Introduction on gdb4hpc, a GDB-based parallel debugger
  • Demo using Perftools

5.45-6pm: Q&A session.

  • Q1/ When we use srun, how will we call perf-report?
  • A1 perf-report srun ./exe. ; mpirun on Ibex and srun on Shaheen
  • Q2/ Is it possible get profile data where the application is combined with Python and precompiled binary (for example)?
  • A2/ If you profile a python script that calls a precompiled binary yes there will be profile data, if you profile a C code that has a built in python interpreter then No..
  • Q3/ Are there any examples on using arm with CUDA
  • A3/ Yes, please see the demos and exercises 
  • Q4  Can we see whether AVX512 or AVX2 is used?
  • A4/ With register view in DDT you can look at the register values and also use disassembler view to look at values and instructions Then you can identify whether the instructions are for AVX 256 or 512. 
  • Q5/ Is ARM supporting roofline model?
  • A5/ MAP will not show you the roofline output directly as it is not evaluating the arithmetic intensity for the code. MPMD profiling is also possible so sequencing pipelines even if no related can be profiled.
  • Q6/  What was the ${p2} after gdb attach for ?
  • A6/ It was actually attach $p without a number. that’s to set the context p which can be focused. The number is about the number of process
  • Q7/ In workload  is mainly using bioinformatics tools in pipelines, with multiple executable running at the same time. How can I use the tools?
  • A7/  On Workflow pipelines – It is possible to profile codes with dependent binaries as we had mentioned for the python related query. For a NGS/MPS pipeline, it’s likely going to be a C/C++ code such as the open source ScalaBLAST and the mpirun/srun with dependent executables will be sampled by the profiler.


Organizer: Bilel Hadri, Computational Scientist at KAUST Supercomputing Core Lab.



Aniello Esposito works as a senior research engineer in the EMEA research lab at HPE, where he is responsible for the center of excellence collaboration with KAUST (Saudi Arabia) among other activities in the field of supercomputing and is also involved in pre-sales as an expert application analyst for European procurements. He is part of the algorithms track-technical committee of the Supercomputing conference and involved in various reviewing activities. Aniello studied physics at ETH Zurich, followed by a PhD on simulation of semiconductor devices and postdoctoral research in computational microscopy. He joined Cray as an application analyst at the HLRS in Stuttgart (Germany), where he supported users of the Cray systems and organized workshops in various European sites and eventually joined the Cray EMEA research lab. His expertise and interests reside in the development and optimization of scientific applications for supercomputers ranging from classical numerical approaches to emerging areas such as machine learning.  

Suyash Sharma is Sr. Applications Engineer High Performance Computing Software Support & Benchmarking at ARM. He has expertise in parallel preconditioning techniques for solving complex non-symmetric large sparse linear systems on heterogenous computing architecture. He has experience as an Applications Engineer with expertise in advanced 3D metrology, Reverse Engineering, CAD and 3D Scanning in working with a leading global organization.

Bilel Hadri is a computational scientist at the Supercomputing Lab at KAUST since July 2013. He is leading efforts in benchmarking, regression testing, performance optimization and helping in coordinating strategic efforts for systems procurements, upgrades and provides regular training to users. He received his Master in Applied Mathematics and his PhD in Computer Science from the University of Houston in 2008. He joined the National Institute for Computational Science at Oak Ridge National Lab as a computational scientist in December 2009 following a Postdoctoral Position in June 2008 at the University of Tennessee Innovative Computing Laboratory lead by Dr. Jack Dongarra. His expertise area includes performance analysis tuning and optimization, System Utilization Analysis, Monitoring and Library Tracking Usage, Porting and Optimizing Scientific Applications on Accelerator Architectures, Linear Algebra, Numerical Analysis, and Multicore Algorithms.


Please contact us at if you need further information.


Follow us on Twitter @KAUST_HPC