Phase-Based Profiling in GPGPU Kernels


Phase-Based Profiling in GPGPU Kernels

Dietrich, R.; Schmitt, F.; Widera, R.; Bussmann, M.

More and more computationally intensive scientific applications make use of hardware accelerators like general purpose graphics processing units (GPGPUs). Compared to software development for typical multi-core processors their programming is fairly complex and needs hardware specific optimizations to utilize the full computing power. To achieve high performance, critical parts of a program have to be identified and optimized. This paper proposes an approach for performance analysis of CUDA kernel source code regions, which for the first time allows measuring the execution times within GPGPU kernels. We developed a tool, which implements the presented method and supports the application developer to easily identify hot spots within the kernel. The presented tool uses compile time code analysis to automatically instrument suitable instrumentation points for minimal program perturbation and further provides support for manual instrumentation. To the best of our knowledge this is the first approach, which allows for scalable runtime analysis within GPGPU kernels. Combined with existing performance analysis techniques this facilitates obtaining the full potential of modern parallel systems.

Keywords: performance analysis; tracing; profiling; GPGPU; CUDA; accelerators; many-core

  • Lecture (Conference)
    41st International Conference on Parallel Processing Workshops, 10.-13.09.2012, Pittsburgh, USA
  • Contribution to proceedings
    41st International Conference on Parallel Processing Workshops, 10.-13.09.2012, Pittsburgh, USA
    Proceedings of the 41st International Conference on Parallel Processing Workshops, 978-1-4673-2509-7, 414-423
    DOI: 10.1109/ICPPW.2012.59
    Cited 4 times in Scopus

Permalink: https://www.hzdr.de/publications/Publ-17423
Publ.-Id: 17423