Data Intensive Workflows (a.k.a. scientific workflows) are a key technology that enable the set up of large data sets analysis experiments in all scientific areas, exploiting capabilities of large-scale distributed and parallel computing infrastructures. Workflows enable scientists to design complex analysis that are composed of individual application components or services and often such components and services are designed, developed, and tested collaboratively. On large-scale computing infrastructures routinely used for e-Sciences today, workflow management systems provide both a formal description of distributed processes and an engine to enact applications composed of wealth of concurrent processes.
The size of the data and the scale of the data analysis flows often lead to complex and distributed data sets management. Workflow formalisms including adequate structures for data sets representation and concurrent processing are needed. Besides the magnitude of data processed by the workflow components, the intermediate and resulting data needs to be annotated with provenance and other information to evaluate the quality of the data and support the repeatability of the analysis.
The process of workflow design and execution in a distributed environment can be very complex and can involve multiple stages including their textual or graphical specification, the mapping of the high-level workflow descriptions onto the available resources, as well as monitoring and debugging of the subsequent execution. Further, since computations and data access operations are performed on shared resources, there is an increased interest in managing the fair allocation and management of those resources at the workflow level.
Data-driven computations are increasingly considered to tackle the wealth of data generated by scientific instruments. Yet, scientific experiments also require the description of complex control flows. Adequate workflow descriptions are needed to support the complex workflow management process, which includes workflow creation, workflow reuse, and modifications made to the workflow over time—for example modifications to the individual workflow components. Additional workflow annotations may provide guidelines and requirements for resource mapping and execution.
The Eighth Workshop on Workflows in Support of Large-Scale Science focuses on the entire workflow lifecycle including the workflow composition, mapping, robust execution and the recording of provenance information. The workshop also welcomes contributions in the applications area, where the requirements on the workflow management systems can be derived. The topics of the workshop include but are not limited to: