Hands-on Performance Tuning of 3D Finite Difference Earthquake Simulation on GPU Fermi Chipset
Yifeng CuiPublished 2012, SCEC Contribution #1757
3D simulation of earthquake ground motion is one of the most challenging computational problems in science. The emergence of graphic processing units (GPU) as an effective alternative to traditional general purpose processors has become increasingly capable in terms of accelerating scientific computing research. In this paper, we describe our experiences in porting AWP-ODC, a 3D finite difference seismic wave propagation code, to the latest GPU Fermi chipset. We completely rewrote this Fortran-based 13-point asymmetric stencil computation code in C and MPI-CUDA in order to take advantage of the powerful GPU computing capabilities. Our new CUDA code implemented the asymmetric 3D stencil on Fermi to make the best use of GPU on-chip memory for an aggressive parallel efficiency. Benchmark on NVIDIA Tesla M2090 demonstrated 10x speedup versus the original fully optimized AWP-ODC FORTRAN MPI code running on a single Intel Nehalem 2.4 GHz CPU socket (4 cores/CPU), and 15x speedup versus the same MPI code running on a single AMD Istanbul 2.6 GHz CPU socket (6 cores/CPU). Sustained single-GPU performance of 143.8 GFLOPS in single precision is benchmarked for the testing case of 128x128x960 mesh size.
Citation
Cui, Y. (2012). Hands-on Performance Tuning of 3D Finite Difference Earthquake Simulation on GPU Fermi Chipset. Oral Presentation at International Conference on Computational Science 2012.