Exascale systems challenge the programmer to write multi-level parallel programs, which means employing multiple different paradigms to address each individual level of parallelism in the system. The long-term challenge is to evolve existing and develop new programming models to better support the application development on exascale machines. In the multi-level programming paradigm FP3C, users are able to express high-level parallelism in the YvetteML workflow language (YML) and employ parallel components written in the XcalableMP (XMP) paradigm. By developing correctness checking techniques for both paradigms, and by investigating the fundamental requirements to first design for and then verify the correctness of parallelization paradigms, MYX aims to combine the know-how and lessons learned of different areas to derive the input necessary to guide the development of future programming models and software engineering methods.

XMP is a PGAS language specified by Japans PC Cluster Consortium for highlevel programming and the main research vehicle for Japans post-petascale programming model research targeting exascale. YML is used to describe the parallelism of an application at a very high level, in particular to couple complex applications YML provides a compiler to translate the YvetteML notation into XMP-parallel programs, and a just-in-time scheduler to manage the execution of parallel programs. The MUST correctness checker can detect a wide range of issues in MPI, OpenMP and hybrid MPI+OpenMP programs by collecting program information and aggregating this in a tree-based overlay network capable of running different types of analysis. Due to the use of the PnMPI profiling interface, MUST can in principle trace and analyze any MPI communication either directly from the application code or any middleware library, such as the XMP runtime.

In MYX we will investigate the application of scalable correctness checking methods to YML, XMP and selected features of MPI. This will result in a clear guideline how to limit the risk to introduce errors and how to best express the parallelism to catch errors that for principle reasons can only be detected at runtime, as well as extended and scalable correctness checking methods.

  • No labels