Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code using the Alpaka library


Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code using the Alpaka library

Matthes, A.; Widera, R.; Zenker, E.; Worpitz, B.; Huebl, A.; Bussmann, M.

We present an analysis on optimizing performance of a single C++11 source code using the Alpaka hardware abstraction library.
While in previous work Alpaka showed close-to-zero overhead compared to native implementations and similar relative numerical performance on a variety of many-core platforms, in this work we focus on performance optimization of the general matrix multiplication (GEMM) algorithm using a simple tiling strategy by tuning tile size and number of tiles computed in parallel. In addition we analyze the optimization potential available with vendor-specific compilers when confronted with the heavily templated abstractions of Alpaka.
We specifically tested the code for bleeding edge architectures such as Nvidia‘s Tesla P100, Intel‘s Knights Landing (KNL) and Haswell architecture as well as IBM‘s Power8 system. On some of these we have been able to reach almost 50% of the peak floating point operation performance using the aforementioned means. When adding compiler-specific #pragmas we were able to reach 5 TFLOPs/s on a P100 and over 1 TFLOPs/s on a KNL system.

Keywords: Heterogeneous computing; HPC; C++; CUDA; OpenMP; Platform portability; Performance portability; Parameter tuning

  • Contribution to proceedings
    2nd International Workshop on Performance Portable Programming Models for Accelerators (P^3MA), 22.06.2017, Frankfurt am Main, Deutschland
    ISC High Performance 2017: High Performance Computing, Vol 10524, 496-514
    DOI: 10.1007/978-3-319-67630-2_36
    Cited 11 times in Scopus
  • Lecture (Conference)
    2nd International Workshop on Performance Portable Programming Models for Accelerators (P^3MA), 22.06.2017, Frankfurt am Main, Deutschland

Permalink: https://www.hzdr.de/publications/Publ-25482