PIConGPU - A Many-GPGPU Particle-in-Cell Code
PIConGPU [1,2] is a relativistic Particle-in-Cell (PIC) code running on graphic processing units (GPUs). It is Open Source und is freely available for download . PIConGPU is developed and maintained by the Junior Group Computational Radiation Physics at the Institute for Radiation Physics at HZDR in close collaboration with the Center for Information Services and High Performance Computing (ZIH) of the Technical University Dresden.
- A variable, Yee-like grid, on which electric and magnetic fields are approximated.
- Particle-Pushers following the works of Boris and Vay
- Maxwell-Solvers as proposed by Yee and Lehe
- Current-Deposition schemes published by Esirkepov and Villasenor-Buneman
- Macro-Particle Form Factors: NGP, CIC, TSC, PSQ
- Computation of far field radiation due to electron motion
- Parallel HDF5 Output
- Online-Visualization (2D/3D)
How does the Particle-in-Cell Algorithm work?
The PIC algorithm solves the so-called Maxwell-Vlasov equation. To solve this equation, electric and magnetic fields are interpolated on a physical grid dividing the simulated volume into cells.
Charged particles like electrons and ions are modeled by macro-particles. These can describe the motion of up to several hundred particles by the motion of a single spread-out particle distribution. The macro-particles' motion is influenced by the electric and magnetic fields on the grid.
These new fields then act back on the particles.
What is so new about PIConGPU?
GPUs show very high computational performance, because many processors work in parallel. In order to make the most out of this performance, the processors should work independently of each other. In case of the PIC-Algorithm this is hard to achieve, since in the current deposition step, currents which are fixed to the cells have to be computed from the velocity of particles moving freely between grid cells. This motion leads to memory access patterns in which current data and particle data are located at different places in the memory and parallel processes can disturb each others execution when accessing the same part of the memory.
All that with a single GPU?
No, because GPUs do not have enough memory to simulate large physical systems. This makes it necesarry to use more than one GPU and distribute the simulated volume between the GPUs.
The problem with this approach is the data transfer between GPUs. For this, data has to be transferred from the GPU to the main memory of the computer housing the GPU. The data has then to be sent via network to the other computers housing the other GPUs
This process normally takes a long time and GPUs have to wait for the end of the data transfer before continuing their computations.
We were able to solve this problem by interleaving the data transfer between GPUs and the computation on a single GPU, so that the GPUs can execute the algorithmic steps continuously without interruption [3,4]. This was only possible because we got help from ZIH, TU Dresden, which provided an efficient library for data transfer between GPUs and tools to measure the performance of our code.
What does this mean for simulations?
We want to speed up the time of the simulation to reduce the time between the start of the simulation and the reception of the final result. With GPUs this speed up can mean that a simulation that normally takes a week to finish can finish within a few hours.
Presentations on PIConGPU
 H. Burau, et al, PIConGPU : A Fully Relativistic Particle-in-Cell Code for a GPU Cluster, IEEE Transactions on Plasma Science 38(10), 2831-2839, 2010
 W. Hönig et al, A Generic Approach for Developing Highly Scalable Particle-Mesh Codes for GPUs, SAAHPC 2010 extended abstract
 M. Bussmann et al, Radiative Signatures of the Relativistic Kelvin-Helmholtz Instability, Proceedings SC13: International Conference for High Performance Computing, Networking, Storage and Analysis 5-1, 2013