Data Intensive Workflows (a.k.a. scientific workflows) are a key technology to manage Big Data analytics in all scientific areas, exploiting capabilities of large-scale distributed and parallel computing infrastructures. Workflows enable scientists to design complex analysis that are composed of individual application components or services designed collaboratively. On large-scale computing infrastructures routinely used for e-Sciences today, workflow management systems provide both a formal description of distributed processes and an engine to enact applications composed of wealth of concurrent processes. Furthermore, workflow enactment engines often ensure data traceability by registering data processing provenance traces upon execution.
Held in conjunction with SC14: The International Conference for High Performance Computing, Networking, Storage and Analysis. In Cooperation with:
The size of the data and the scale of the data analysis flows often lead to complex and distributed data sets management. Workflow formalisms including adequate structures for Big Data sets representation and concurrent processing are needed. Besides the magnitude of data processed by the workflow components, the intermediate and resulting data needs to be annotated with provenance and other information to evaluate the quality of the data and support the repeatability of the analysis.
The process of workflow design and execution in a distributed environment can be very complex and can involve multiple stages including their textual or graphical specification, the mapping of the high-level workflow descriptions onto the available resources, as well as monitoring and debugging of the subsequent execution. Further, since computations and data access operations are performed on shared resources, there is an increased interest in managing the fair allocation and management of those resources at the workflow level.
Data-driven computations are increasingly considered to harness the Big Data challenges. Yet, scientific experiments also require the description of complex control flows. Adequate workflow descriptions are needed to support the complex workflow management process, which includes workflow design, workflow reuse, and modifications made to the workflow over time, for example modifications to the individual workflow components. Additional workflow annotations may provide guidelines and requirements for resource mapping and execution.
The Ninth Workshop on Workflows in Support of Large-Scale Science focuses on the entire workflow lifecycle including the workflow design, mapping, robust execution and the recording of provenance information. The workshop also welcomes contributions in the applications area, where the requirements on the workflow management systems can be derived. The topics of the workshop include but are not limited to: