Helmholtz-Gemeinschaft
DRESDEN-concept

Mobile version: On

Eye catcher

PIConGPU


Interaction of an ultra-short high-intensity laser pulse with a hydrogen gas

PIConGPU [1] is a relativistic Particle-in-Cell (PIC) code running on graphic processing units (GPUs). For Experts: PIConGPU is a fully 3D3V PIC-Code using a Yee-Lattice, Boris-Pusher und Villasenor-Buneman current deposition scheme. Macro-particle form factors include NGP and CIC.

PIConGPU is developed and maintained by the Junior Group Computational Radiation Physics at the Institute for Radiation Physics at HZDR in close collaboration with the Center for Information Services and High Performance Computing (ZIH) of the Technical University Dresden.

GPUs today reach a performance up to TFLOP/s at considerable lower invest and maintenance cost compared to CPU-based compute architectures of similar performance.

The Particle-in-Cell algorithm is a central tool in plasma physics. It describes the dynamics of a plasma by computing the motion of electrons and ions in the plasma based on Maxwell's equations.

How does the Particle-in-Cell Algorithm work?

The particle-in-cell algorithm
The Particle-in-Cell Algorithm
Using the electric (E) and magnetic (B) field the algorithm computes the Lorentz force (F) acting on the macro-particles (distribution function f). From this it computes the new positions and velocities (u) of the macro-particles. The current (J) is computed from these velocities and is then used to calculate the change in electric and magnetic field.
Download

The PIC algorithm solves the so-called Maxwell-Vlasov equation. To solve this equation, electric and magnetic fields are interpolated on a physical grid dividing the simulated volume into cells.

Charged particles like electrons and ions are modeled by macro-particles. These can describe the motion of up to several hundred particles by the motion of a single spread-out particle distribution. The macro-particles' motion is influenced by the electric and magnetic fields on the grid.

The particle motion in turn creates currents. Following Ampère's law these currents create magnetic fields. These magnetic fields in turn create electric fields as described by Faraday's law.

These new fields then act back on the particles.

What is so new about PIConGPU?

Comparison of a GPU simulation with a CPU simulation
Comparison of a GPU simulation with a CPU simulation
Comparison of a GPU simulation (left) of laser-wakefield acceleration of electrons with a CPU simulation (right). The GPU simulation takes under one hour to complete the same task the CPU simulation takes a week to compute.
Download

GPUs show very high computational performance, because many processors work in parallel. In order to make the most out of this performance, the processors should work independently of each other. In case of the PIC-Algorithm this is hard to achieve, since in the current deposition step, currents which are fixed to the cells have to be computed from the velocity of particles moving freely between grid cells. This motion leads to memory access patterns in which current data and particle data are located at different places in the memory and parallel processes can disturb each others execution when accessing the same part of the memory.

Recently, this problem was solved in our group using a new data model for particle and grid-based data [2], so that now currents can be computed in a highly parallel way on the GPU.

All that with a single GPU?

Scaling PIConGPU
Weak scaling PIConGPU
The scaling could be tested up to 786 GPUs. For the weak scaling the size of the simulated system and, accordingly, the number of GPUs are doubled. Thus, the duration of a time step should stay the same.
Download

No, because GPUs do not have enough memory to simulate large physical systems. This makes it necesarry to use more than one GPU and distribute the simulated volume between the GPUs.

The problem with this approach is the data transfer between GPUs. For this, data has to be transferred from the GPU to the main memory of the computer housing the GPU. The data has then to be sent via network to the other computers housing the other GPUs

This process normally takes a long time and GPUs have to wait for the end of the data transfer before continuing their computations.

We were able to solve this problem by interleaving the data transfer between GPUs and the computation on a single GPU, so that the GPUs can execute the algorithmic steps continuously without interruption [1,2,3]. This was only possible because we got help from th ZIH, which provided an efficient library for data transfer between GPUs and tools to measure the performance of our code.

What does this mean for the simulation?

We want to speed up the time of the simulation to reduce the time between the start of the simulation and the reception of the final result. With GPUs this speed up can mean that a simulation that normally takes a week to finish can finish within a few hours. Not to forget we can reduce the costs for investment and energy drastically.

What is the future of PIConGPU?

We want to improve PIConGPU and make it available to a larger community of users [4]. PIConGPU is modular in the sense that various physical models can be easily incorporated and thus a large number of physical questions can be answered.

PIConGPU Partners

Logo ZIH Dresden CUDA Center of Excellence

Presentations on PIConGPU

Supercomputing Conference SC'12, Salt Lake City, UT, USA

Advanced Accelerator Concepts Workshop AAC 2012, Austin, TX, USA

GPU Technology Conference GTC 2012, San Jose, CA, USA

References

[1] H Burau, et al, PIConGPU : A Fully Relativistic Particle-in-Cell Code for a GPU Cluster, IEEE Transactions on Plasma Science 38(10), 2831-2839 (October 2010)

[2] W. Hönig et al, A Generic Approach for Developing Highly Scalable Particle-Mesh Codes for GPUs, SAAHPC 2010 extended abstract

[3] G. Juckeland, M. Bussmann, Developing Highly Scalable Particle-Mesh Codes for GPUs: A Generic Approach, NVIDIA GTC 2010

[4] PIConGPU repository (not yet publicly available)