A Generic Approach for Developing Highly Scalable Particle-Mesh Codes for GPUs


A Generic Approach for Developing Highly Scalable Particle-Mesh Codes for GPUs

Hönig, W.; Schmitt, F.; Widera, R.; Burau, H.; Juckeland, G.; Mueller, M. S.; Bussmann, M.

We present a general framework for GPU-based low-latency data transfer schemes that can be used for a variety of particlemesh algorithms [9]. This framework allows to hide the latency of the data transfer between GPU-accelerated computing nodes by interleaving it with the kernel execution on the GPU. We discuss as an example the fully relativistic particle-in-cell (PiC) code PIConGPU [6] currently used to simulate particle acceleration by extremely short high-energy laser pulses. The PiC algorithm is a versatile algorithm used frequently in plasma physics—especially for large-scale simulations of fusion plasmas [14]—, in astrophysics [10], or for the simulation of particle accelerators [12]. A special Cell processor version is used as a benchmark code for the Roadrunner system at Los Alamos National Lab [5]. The presented hybrid GPU-CPU data transfer and access framework can, furthermore, be used for general particle-mesh schemes. GPU memory access to particle data and mesh data are efficiently separated, while data that has to be exchanged between domains located on different GPUs is transferred during computing steps using GPU-CPU memory copy and MPI. A simulation of laser-wakefield acceleration of electrons in an underdense plasma serves as a real-world benchmark for the performance of the framework.

Keywords: gpu; gpgpu; performance; particle-mesh; particle-in-cell; pic; simulation; algorithm; communication; framework; laser; plasma

Permalink: https://www.hzdr.de/publications/Publ-14163