Pegasus, a Workflow Management System for Large-Scale Science

Ewa Deelman, Karan Vahi, Gideon Juve, Mats Rynge, Scott Callaghan, Philip J. Maechling, R. Mayani, Weiwei Chen, R. da Silva, M. Livny, & K. Wenger

Published 2014, SCEC Contribution #1911

Modern science often requires large-scale, multi-stage simulation and data analysis pipelines to enable the study of complex systems. The amount of computation and data involved in these pipelines requires scalable workflow systems that are able to reliably and efficiently coordinate and automate data movement and task execution on distributed computational resources, including campus clusters, national cyberinfrastructures, and commercial clouds. In this paper we describe the design, development and evolution of the Pegasus Workflow Management System, which maps abstract workflow descriptions onto distributed computing infrastructure. Pegasus has been used for more than 12 years by scientists in a wide variety of domains, including astronomy, seismology, bioinformatics, and physics. We describe how Pegasus achieves reliable, scalable workflow execution across a wide variety of different computing infrastructures.

Deelman, E., Vahi, K., Juve, G., Rynge, M., Callaghan, S., Maechling, P. J., Mayani, R., Chen, W., da Silva, R., Livny, M., & Wenger, K. (2014). Pegasus, a Workflow Management System for Large-Scale Science. Future Generation Computer Systems, 46, 15-35. doi: