Mobile version: On
Interaction of an ultra-short high-intensity laser pulse with a hydrogen gas
PIConGPU  is a relativistic Particle-in-Cell (PIC) code running on graphic processing units (GPUs). For Experts: PIConGPU is a fully 3D3V PIC-Code using a Yee-Lattice, Boris-Pusher und Villasenor-Buneman current deposition scheme. Macro-particle form factors include NGP and CIC.
PIConGPU is developed and maintained by the Junior Group Computational Radiation Physics at the Institute for Radiation Physics at HZDR in close collaboration with the Center for Information Services and High Performance Computing (ZIH) of the Technical University Dresden.
How does the Particle-in-Cell Algorithm work?
|The Particle-in-Cell Algorithm|
|Using the electric (E) and magnetic (B) field the algorithm computes the Lorentz force (F) acting on the macro-particles (distribution function f). From this it computes the new positions and velocities (u) of the macro-particles. The current (J) is computed from these velocities and is then used to calculate the change in electric and magnetic field.|
The PIC algorithm solves the so-called Maxwell-Vlasov equation. To solve this equation, electric and magnetic fields are interpolated on a physical grid dividing the simulated volume into cells.
Charged particles like electrons and ions are modeled by macro-particles. These can describe the motion of up to several hundred particles by the motion of a single spread-out particle distribution. The macro-particles' motion is influenced by the electric and magnetic fields on the grid.
These new fields then act back on the particles.
What is so new about PIConGPU?
|Comparison of a GPU simulation with a CPU simulation|
|Comparison of a GPU simulation (left) of laser-wakefield acceleration of electrons with a CPU simulation (right). The GPU simulation takes under one hour to complete the same task the CPU simulation takes a week to compute.|
GPUs show very high computational performance, because many processors work in parallel. In order to make the most out of this performance, the processors should work independently of each other. In case of the PIC-Algorithm this is hard to achieve, since in the current deposition step, currents which are fixed to the cells have to be computed from the velocity of particles moving freely between grid cells. This motion leads to memory access patterns in which current data and particle data are located at different places in the memory and parallel processes can disturb each others execution when accessing the same part of the memory.
Recently, this problem was solved in our group using a new data model for particle and grid-based data , so that now currents can be computed in a highly parallel way on the GPU.
All that with a single GPU?
|Weak scaling PIConGPU|
|The scaling could be tested up to 786 GPUs. For the weak scaling the size of the simulated system and, accordingly, the number of GPUs are doubled. Thus, the duration of a time step should stay the same.|
No, because GPUs do not have enough memory to simulate large physical systems. This makes it necesarry to use more than one GPU and distribute the simulated volume between the GPUs.
The problem with this approach is the data transfer between GPUs. For this, data has to be transferred from the GPU to the main memory of the computer housing the GPU. The data has then to be sent via network to the other computers housing the other GPUs
This process normally takes a long time and GPUs have to wait for the end of the data transfer before continuing their computations.
We were able to solve this problem by interleaving the data transfer between GPUs and the computation on a single GPU, so that the GPUs can execute the algorithmic steps continuously without interruption [1,2,3]. This was only possible because we got help from th ZIH, which provided an efficient library for data transfer between GPUs and tools to measure the performance of our code.
What does this mean for the simulation?
We want to speed up the time of the simulation to reduce the time between the start of the simulation and the reception of the final result. With GPUs this speed up can mean that a simulation that normally takes a week to finish can finish within a few hours. Not to forget we can reduce the costs for investment and energy drastically.
What is the future of PIConGPU?
We want to improve PIConGPU and make it available to a larger community of users . PIConGPU is modular in the sense that various physical models can be easily incorporated and thus a large number of physical questions can be answered.
Presentations on PIConGPU
 H Burau, et al, PIConGPU : A Fully Relativistic Particle-in-Cell Code for a GPU Cluster, IEEE Transactions on Plasma Science 38(10), 2831-2839 (October 2010)
 W. Hönig et al, A Generic Approach for Developing Highly Scalable Particle-Mesh Codes for GPUs, SAAHPC 2010 extended abstract
 G. Juckeland, M. Bussmann, Developing Highly Scalable Particle-Mesh Codes for GPUs: A Generic Approach, NVIDIA GTC 2010
 PIConGPU repository (not yet publicly available)