We model, simulate and visualise the dynamics of particles and radiation phenomena that are of interest when investigating the physics of laser particle acceleration and develop massively parallel computing schemes.

HZDR Developer Team PIConGPU (from left to right): René Widera, Heiko Burau, Michael Bussmann, Richard Pausch, Axel Hübl

# PIConGPU - A Many-GPGPU Particle-in-Cell Code

PIConGPU [1,2] is a relativistic Particle-in-Cell (PIC) code running on graphic processing units (GPUs). It is Open Source und is freely available for download [1]. PIConGPU is developed and maintained by the Junior Group Computational Radiation Physics at the Institute for Radiation Physics at HZDR in close collaboration with the Center for Information Services and High Performance Computing (ZIH) of the Technical University Dresden.

PIConGPU features...

The Particle-in-Cell algorithm is a central tool in plasma physics. It describes the dynamics of a plasma by computing the motion of electrons and ions in the plasma based on Maxwell's equations.

## How does the Particle-in-Cell Algorithm work?

The PIC algorithm solves the so-called Maxwell-Vlasov equation. To solve this equation, electric and magnetic fields are interpolated on a physical grid dividing the simulated volume into cells.

Charged particles like electrons and ions are modeled by macro-particles. These can describe the motion of up to several hundred particles by the motion of a single spread-out particle distribution. The macro-particles' motion is influenced by the electric and magnetic fields on the grid.

The particle motion in turn creates currents. Following Ampère's law these currents create magnetic fields. These magnetic fields in turn create electric fields as described by Faraday's law.

These new fields then act back on the particles.

## What is so new about PIConGPU?

GPUs show very high computational performance, because many processors work in parallel. In order to make the most out of this performance, the processors should work independently of each other. In case of the PIC-Algorithm this is hard to achieve, since in the current deposition step, currents which are fixed to the cells have to be computed from the velocity of particles moving freely between grid cells. This motion leads to memory access patterns in which current data and particle data are located at different places in the memory and parallel processes can disturb each others execution when accessing the same part of the memory.

Recently, this problem was solved in our group using a new data model for particle and grid-based data and asynchronous data transfer [3,4].

## All that with a single GPU?

No, because GPUs do not have enough memory to simulate large physical systems. This makes it necesarry to use more than one GPU and distribute the simulated volume between the GPUs.

The problem with this approach is the data transfer between GPUs. For this, data has to be transferred from the GPU to the main memory of the computer housing the GPU. The data has then to be sent via network to the other computers housing the other GPUs

This process normally takes a long time and GPUs have to wait for the end of the data transfer before continuing their computations.

We were able to solve this problem by interleaving the data transfer between GPUs and the computation on a single GPU, so that the GPUs can execute the algorithmic steps continuously without interruption [3,4]. This was only possible because we got help from ZIH, TU Dresden, which provided an efficient library for data transfer between GPUs and tools to measure the performance of our code.

## What does this mean for simulations?

We want to speed up the time of the simulation to reduce the time between the start of the simulation and the reception of the final result. With GPUs this speed up can mean that a simulation that normally takes a week to finish can finish within a few hours.

## Presentations on PIConGPU

Advanced Accelerator Concepts Workshop AAC 2012, Austin, TX, USA

## References

[2] H. Burau, et al, PIConGPU : A Fully Relativistic Particle-in-Cell Code for a GPU Cluster, IEEE Transactions on Plasma Science 38(10), 2831-2839, 2010

[3] W. Hönig et al, A Generic Approach for Developing Highly Scalable Particle-Mesh Codes for GPUs, SAAHPC 2010 extended abstract

[4] M. Bussmann et al, Radiative Signatures of the Relativistic Kelvin-Helmholtz Instability, Proceedings SC13: International Conference for High Performance Computing, Networking, Storage and Analysis 5-1, 2013