Workflow-based high performance data transfer and ingestion to petascale simulations on TeraGrid

Jun Zhou, Yifeng Cui, Steve Davis, Clarck C. Guest, & Philip J. Maechling

Published 2010, SCEC Contribution #1508

In this paper we report on high performance data transfer and ingestion design carried out in a scientific workflow project to support Southern California Earthquake Center (SCEC) petascale simulations on TeraGrid (TG), which is conducive to utilize the grid resource to pipeline data pre- and post-processing in this workflow simulation. We develop an enhanced prototype framework that brings together Globus Toolkit and advanced MPI batch jobs for reliable and efficient data transfer between heterogeneous supercomputer clusters on TG. The framework automates the whole process of data transfer without human intervention and it can recover automatically from any failures during the transfers. We also examine optimization approaches for ingesting simulation data into the iRODS (Integrated Rule-Oriented Data System) digital library. The average transfer rate from TACC Ranger to iRODS achieves 133MB/sec, 5 times faster than conventional methods. Experiments performed on TG clusters demonstrated that these concurrent data transfer and ingestion mechanisms can shorten the processing time of the scientific workflow and significantly reduce the load as well.

Citation
Zhou, J., Cui, Y., Davis, S., Guest, C. C., & Maechling, P. J. (2010). Workflow-based high performance data transfer and ingestion to petascale simulations on TeraGrid. Oral Presentation at IEEE Third International Joint Conference on Computational Sciences and Optimization. doi: 10.1109/CSO.2010.235.